1. A simple example
Suppose \(X\) is equally likely to be \(1\), \(2\), or \(4\). Then the envelope pair is \(\{1,2\}\), \(\{2,4\}\), or \(\{4,8\}\), each with probability \(\frac13\).
Once a model for \(X\) is fixed, the observed value can matter.
2. The same observation can result in different outcomes
Now compare two deterministic models.
Model A
\(X=1\) every time, so the envelopes are \(\{1,2\}\).
If you observe \(A=2\), then you must be holding the larger amount.
Best action: stay.
Model B
\(X=2\) every time, so the envelopes are \(\{2,4\}\).
If you observe \(A=2\), then you must be holding the smaller amount.
Best action: switch.
The value \(A\) by itself does not settle the question. What matters is how likely the different hidden stories are under the model.
3. Bayes' theorem
Let \(S\) be the event that you were handed the smaller envelope, and let \(L\) be the event that you were handed the larger one. Before you open anything,
\[
\mathbb{P}(S)=\mathbb{P}(L)=\frac12.
\]
After you observe \(A=a\), those probabilities may change. Bayes' theorem is the rule that updates them:
\[
\mathbb{P}(S\mid A=a)=
\frac{\mathbb{P}(A=a\mid S)\,\mathbb{P}(S)}
{\mathbb{P}(A=a\mid S)\,\mathbb{P}(S)+\mathbb{P}(A=a\mid L)\,\mathbb{P}(L)}.
\]
The denominator comes from the law of total probability: it adds the contributions from the two hidden stories \(S\) and \(L\).
- \(\mathbb{P}(S\mid A=a)\) is the updated probability that you have the smaller envelope after seeing \(a\).
- \(\mathbb{P}(A=a\mid S)\) is the probability of seeing \(a\) if you were handed the smaller envelope.
- \(\mathbb{P}(S)\) is the probability of being handed the smaller envelope before opening anything.
- \(\mathbb{P}(A=a\mid L)\) and \(\mathbb{P}(L)\) describe the second story, where you were handed the larger envelope.
4. Applying Bayes' theorem here
After you observe \(A=a\), there are again two stories to compare.
Story 1: event \(S\)
You were handed the smaller envelope.
Then \(X=a\), and the other envelope contains \(2a\).
Story 2: event \(L\)
You were handed the larger envelope.
Then \(X=\frac a2\), and the other envelope contains \(\frac a2\).
These stories tell us that
\[
\mathbb{P}(A=a\mid S)=\mathbb{P}(X=a),
\qquad
\mathbb{P}(A=a\mid L)=\mathbb{P}\!\left(X=\frac a2\right).
\]
Using \(\mathbb{P}(S)=\mathbb{P}(L)=\frac12\) and the law of total probability,
\[
\mathbb{P}(A=a)
=
\mathbb{P}(A=a\mid S)\mathbb{P}(S)
+
\mathbb{P}(A=a\mid L)\mathbb{P}(L)
\]
becomes
\[
\mathbb{P}(A=a)
=
\frac12\mathbb{P}(X=a)
+
\frac12\mathbb{P}\!\left(X=\frac a2\right).
\]
Substituting into Bayes' theorem gives
\[
\mathbb{P}(S\mid A=a)
=
\frac{\mathbb{P}(X=a)}{\mathbb{P}(X=a)+\mathbb{P}\!\left(X=\frac a2\right)},
\]
\[
\mathbb{P}(L\mid A=a)
=
\frac{\mathbb{P}\!\left(X=\frac a2\right)}{\mathbb{P}(X=a)+\mathbb{P}\!\left(X=\frac a2\right)}.
\]
If \(\frac a2\) is not a positive integer, or if it is excluded by the model, then \(\mathbb{P}(X=\frac a2)=0\).
5. What the formula tells you
Switching wins exactly when event \(S\) occurs, so the decision comes from the size of \(\mathbb{P}(S\mid A=a)\).
Because of the formula above, this can be checked by comparing two probabilities in the model:
- switch if \(\mathbb{P}(X=a)>\mathbb{P}(X=\frac a2)\),
- stay if \(\mathbb{P}(X=a)<\mathbb{P}(X=\frac a2)\),
- tie if the two probabilities are equal.
This now explains several earlier examples in one framework. Odd observations force switching because \(\mathbb{P}(X=\frac a2)=0\). If the model rules out \(X=a\), then staying is forced. The “more than half the money in the world” example is the same kind of idea: one story is impossible, so it gets probability \(0\).
This page answers what to do when a model for \(X\) is known. The remaining question is what is still possible when that model is unknown.
6. Last question
We have answered most of the problems we started with. The last basic question is whether it is possible to do better than \(50\)-\(50\), no matter how \(X\) is generated.
What do you think? Before moving on, pause and make a prediction.