## 3.1 Conditional probability

A probability is a measure of the degree of uncertainty of an event. Conditional probability concerns revising the measure of degree of uncertainty to reflect any “new” information about the outcome of the underlying random phenomenon.

**Example 2.53 **
The probability^{66} that a randomly
selected American adult supports impeachment of President Trump is 0.49.

- Suppose the randomly selected person is a Democrat. Do you think the probability that the randomly selected
*Democrat*supports impeachment is 0.49? - The probability
^{67}that a randomly selected American is a Democrat is 0.31. Donny Don’t says that the probability that a randomly selected American both (1) is a Democrat,*and*(2) supports impeachment is equal to \(0.49\times 0.31\). Do you agree? - Without further information, provide a range of “logically possible” values for the probability in the previous part. (“Logically possible” means they satisfy the rules of probability, even though they might not be realistic in context.)
- Suppose that the probability that a randomly selected American both is a Democrat and supports impeachment is 0.26. Construct an appropriate two-way table of probabilities.
- Construct a corresponding two-way table of hypothetical counts.
- Find the probability
^{68}that a randomly selected American*who is a Democrat*supports impeachment. - How can the probability in the previous part be written in terms of the probabilities provided in the setup?
- Find the probability that a randomly selected American
*who supports impeachment*is a Democrat.

*Solution*to Example 2.53

- The probability that the randomly selected
*Democrat*supports impeachment is probably a lot larger than 0.49. Knowing that the selected person is a Democrat would change the probability of supporting impeachment. - No. Consider a hypothetical set of 100 Americans. We would expect about 31 of these 100 Americans to be Democrats. However, we would not expect just 15 — that is, about half (\((0.49)(31)\approx 15\)) — of the 31 Democrats to support impeachment; we’d expect more, say 26 out of the 31. If 26 of the 100 Americans are Democrats who support impeachment, this would be consistent with a value of 0.26 for the probability that a randomly selected American both (1) is a Democrat,
*and*(2) supports impeachment equal. - We could make a table like in the following part and see what values produce valid tables. If \(A\) is the event that the selected person supports impeachment, and \(B\) is the event that the person is a Democrat, and \(\textrm{P}\) corresponds to randomly selecting an American, then \(\textrm{P}(A) = 0.49\) and \(\textrm{P}(B) = 0.31\). By the subset rule \(\textrm{P}(A\cap B)\le \min(\textrm{P}(A), \textrm{P}(B)) = 0.31\). The largest \(\textrm{P}(A\cap B)\) can be is 0.31, which corresponds to all Democrats supporting impeachment. In this case, the smallest \(\textrm{P}(A \cap B)\) can be is 0, which corresponds to no Democrats supporting impeachment. The extremes are not realistic, but without knowing more information, we do not know where \(\textrm{P}(A\cap B)\) lies in \(0\le \textrm{P}(A \cap B) \le 0.31\).

If the probability that a randomly selected American both is a Democrat and supports impeachment is 0.26, then the two-way table of probabilities is

\(A\) \(A^c\) Total \(B\) 0.26 0.05 0.31 \(B^c\) 0.23 0.46 0.69 Total 0.49 0.51 1.00 It is often much easier to work with counts rather than probabilities. Start with a nice round total

^{69}count like 10000 and then construct a table of hypothetical counts, assuming the counts follow the probabilities in the table above.Impeach Not Impeach Total Democrat 2600 500 3100 Not Democrat 2300 4600 6900 Total 4900 5100 10000 - Working with counts, there are 3100 Democrats, of which 2600 support impeachment, so \(2600/3100=0.839\) is the probability that a randomly selected American
*who is a Democrat*supports impeachment. - \(\frac{\textrm{P}(A\cap B)}{\textrm{P}(B)} = \frac{0.26}{0.31}=0.839\)
There are 4900 who support impeachment, of which 2600 are Democrats, so \(\frac{2600}{4900} = \frac{0.26}{0.49}=\frac{\textrm{P}(A\cap B)}{\textrm{P}(A)} =0.531\) is the probability that a randomly selected American

*who supports impeachment*is a Democrat. Notice that this part and the previous part have the same numerator, \(\textrm{P}(A\cap B)\), but different*denominators*. Also notice that the probabilities are quite different in this part and the previous part.

**Definition 2.16 **
The **conditional probability of event \(A\) given event \(B\)**, denoted \(\textrm{P}(A|B)\), is defined as^{70}
\[
\textrm{P}(A|B) = \frac{\textrm{P}(A\cap B)}{\textrm{P}(B)}
\]

The conditional probability \(\textrm{P}(A|B)\) represents how the likelihood or degree of uncertainty of event \(A\) should be updated to reflect information that event \(B\) has occurred. The *unconditional* probability \(\textrm{P}(A)\) is often called the *prior probability* (a.k.a., base rate) of \(A\) (prior to observing \(B\)). The *conditional* probability \(\textrm{P}(A|B)\) is the *posterior probability* of \(A\) after observing \(B\).

In general, knowing whether or not event \(B\) occurs influences the probability of event \(A\). That is, \[ \text{In general, } \textrm{P}(A|B) \neq \textrm{P}(A) \] For example, without knowing a person’s political party, the probability of supporting impeachment is 0.49, but after learning the person is a Democrat, the probability of supporting impeachment changed to 0.839.

Be careful: order is essential in conditioning. That is, \[ \text{In general, } \textrm{P}(A|B) \neq \textrm{P}(B|A) \]

**Example 2.54 **
Which of the following is larger - 1 or 2?

- The probability that a randomly selected man who is greater than six feet tall plays in the NBA.
- The probability that a randomly selected man who plays in the NBA is greater than six feet tall.

*Solution*to Example 2.54

The probability in (2) is much larger. The corresponding fractions would have the same numerator — number of men who are both greater than six feet tall and play in the NBA — but vastly different denominators.

- There are over a billion men in the world who are greater than six feet tall, only a few hundred of whom play in the NBA. The probability that a randomly selected man who is greater than six feet tall plays in the NBA is pretty close to 0.
- There only a few hundred men who play in the NBA, almost all of whom are greater than six feet tall. The probability that a randomly selected man who plays in the NBA is greater than six feet tall is pretty close to 1.

When dealing with probabilities, especially conditional probabilities, be sure to ask “probability *of what*?” That is, what is the appropriate *sample space*? Thinking in fraction terms, be sure to identify the total/baseline group which corresponds to the *denominator*. Be very careful when translating between numbers and words.

To emphasize, \(\textrm{P}(A|B)\) is not the same as \(\textrm{P}(B|A)\) and they can be vastly different. In particular, the conditional probabilities can be highly influenced by the original unconditional probabilities of the events, \(\textrm{P}(A)\) and \(\textrm{P}(B)\), sometimes called the **base rates**. Don’t neglect the base rates when evaluating probabilities.

For example, the probability that a randomly selected man plays in the NBA is pretty close to 0 (the base rate). Learning that the man is greater than six feet tall is not going to change much our probability that he plays in the NBA.

### 3.1.1 Simulating conditional probabilities

**Example 2.55 **
Consider simulating a randomly selected American and determining whether or not the person supports impeachment and whether or not the person is a Democrat, as in the scenario in Example 2.53. Remember we are given \(\textrm{P}(A) = 0.49\), \(\textrm{P}(B) = 0.31\), and \(\textrm{P}(A\cap B) = 0.26\) where \(A\) is the event that the selected person supports impeachment and \(B\) is the event that the selected person is a Democrat.

- Donny Don’t says we need two spinners: One spinner with areas of 0.49 and 0.51 to represent Support/Not support, and another spinner with areas of 0.31 and 0.69 to represent Democrat/Not Democrat. Then spin each spinner once to simulate one repetition. Do you agree?
- How could you perform one repetition of the simulation using
*one*spinner? - How could you perform a simulation, using the spinner in the previous part, to estimate \(\textrm{P}(A | B)\)?
- What determines the order of magnitude of the the margin of error for your estimate in the previous part?
- What is another method for performing the simulation and estimating \(\textrm{P}(A |B)\) that has a smaller margin of error? What is the disadvantage of this method?

*Solution*to Example 2.55

- No, this assumes there is no relationship between party and support. But we know that Democrats will be much more likely to support impeachment than non-Democrats. In general, you can not simulate pairs of events simply for the marginal distribution of each.
- You need to construct a spinner for the possible occurrences of the pairs of events and their joint probabilities.

- The following method fixes the number of total spins, say 10000.
- Spin the joint spinner from the previous part once to simulate a (party, support) pair.
- Repeat a fixed number of times, say 10000.
- Discard the repetitions on which the person was not a Democrat, that is, the repetitions on which \(B\) did not occur. You would expect to have around 3100 repetitions left.
- Among the remaining repetitions (on which \(B\) occurred), count the number of repetitions on which \(A\) also occurred. So for the roughly 3100 repetitions for which the person was a Democrat, count the repetitions on which the person also supported impeachment; you would expect a count of around 2600.
- Estimate \(\textrm{P}(A|B)\) by dividing the two previous counts. \[ \textrm{P}(A | B)\approx \frac{\text{Number of repetitions on which both $A$ and $B$ occurred}}{\text{Number of repetitions on which $B$ occurred}} \]

- Only those repetitions in which \(B\) occurred are used to estimate \(\textrm{P}(A|B)\). So the order of magnitude of the margin of error is determined by the number of repetitions on which \(B\) occurs. Roughly this would be around 3100, rather than 10000.
- The previous method simulated a fixed number of repetitions first, and then discarded the ones that did not meet the condition. We could instead discard repetitions that do not meet the condition as we go, and keep performing repetitions until we get a fixed number, say 10000, that do satisfy the condition. In this way, the estimate \(\textrm{P}(A |B)\) will be based on the fixed number of repetitions, say 10000, that satisfy event \(B\). The disadvantage is increased computational burden; we will need to simulate and discard many repetitions in order to achieve that the desired number that satisfy the condition.

There are two basic ways to use simulation to approximate a conditional probability \(\textrm{P}(A|B)\).

- Simulate the random phenomenon for a set number of repetitions (say 10000),
*discard those repetitions on which \(B\) does not occur*, and compute the relative frequency of \(A\) among the remaining repetitions (on which \(B\) does occur).- Disadvantage: the margin of error is based on only the number of repetitions used to compute the relative frequency. So if you perform 10000 repetitions but \(B\) occurs only on 2000, then the margin of error for estimate \(\textrm{P}(A|B)\) is roughly on the order of \(1/\sqrt{2000}\).

- Advantage: not computationally intensive.

- Disadvantage: the margin of error is based on only the number of repetitions used to compute the relative frequency. So if you perform 10000 repetitions but \(B\) occurs only on 2000, then the margin of error for estimate \(\textrm{P}(A|B)\) is roughly on the order of \(1/\sqrt{2000}\).
- Simulate the random phenomenon
*until obtaining a certain number of repetitions (say 10000) on which \(B\) occurs*, discarding those repetitions on which \(B\) does not occur as you go, and compute the relative frequency of \(A\) among the remaining repetitions (on which \(B\) does occur).- Advantage: the margin of error will be based on the set number of repetitions on which \(B\) occurs.
- Disadvantage: requires more time/computer power. Especially if \(\textrm{P}(B)\) is small, it will require a large number of repetitions of the simulation to achieve the desired number of repetitions on which \(B\) occurs.

In Symulate, `filter`

can be used to extract repetitions that satisfy a condition. The following simulates impeachment support status and party affiliation for 10000 hypothetical Americans and then applies `filter`

to retain only the Democrats. The function `is_Democrat`

takes as an imput a (support status, party affiliation pair) and returns `True`

if Democrat (and `False`

otherwise).

```
def is_Democrat(Support_Party):
return Support_Party[1] == 'Democrat'
P = BoxModel([('Support', 'Democrat'), ('Support', 'Not Democrat'), ('Not Support', 'Democrat'), ('Not Support', 'Not Democrat')],
probs = [0.26, 0.23, 0.05, 0.46])
P.sim(10000).filter(is_Democrat).tabulate()
```

`## {('Support', 'Democrat'): 2668, ('Not Support', 'Democrat'): 463}`

In Symbulate, the given symbol `|`

applies the second method to simulate a fixed number of repetitions that satisfy the event being conditioned on. Be careful when using `|`

when conditioning on an event with small probability. In particular, be careful when conditioning on the value of a continuous random variable.

```
P = BoxModel([('Support', 'Democrat'), ('Support', 'Not Democrat'), ('Not Support', 'Democrat'), ('Not Support', 'Not Democrat')],
probs = [0.26, 0.23, 0.05, 0.46])
Support, Party = RV(P)
( (Support & Party) | (Party == 'Democrat') ).sim(10000).tabulate()
```

`## {(Not Support, Democrat): 1611, (Support, Democrat): 8389}`

### 3.1.2 Joint, conditional, and marginal probabilities

Within the context of two events, we have joint, conditional, and marginal probabilities.

- Joint: unconditional probability involving both events, \(\textrm{P}(A \cap B)\).
- Conditional: conditional probability of one event given the other, \(\textrm{P}(A | B)\), \(\textrm{P}(B | A)\).
- Marginal: unconditional probability of a single event \(\textrm{P}(A)\), \(\textrm{P}(B)\).

The relationship \(\textrm{P}(A|B) = \textrm{P}(A\cap B)/\textrm{P}(B)\) can be stated generically as \[ \text{conditional} = \frac{\text{joint}}{\text{marginal}} \]

In the previous impeachment problem, we were provided the joint and marginal probabilities and we computed conditional probabilities. But in many problems conditional probabilities are provided or can be determined directly.

**Example 2.56 **
Recent polls^{71} suggest that

- 83% of Democrats support impeachment of President Trump
- 44% of Independents support impeachment of President Trump
- 14% of Republicans support impeachment of President Trump

- The average of these three percentages is \((83+44+14) = 47\). Is it necessarily true that 47% of all Americans support impeachment?
- Based on recent polls
^{72}

- 31% of Americans are Democrats
- 40% of Americans are Independent
- 29% of Americans are Republicans

Define the event \(A\) to represent “supports impeachment” and \(D, I, R\) to correspond to affiliation in each of the parties. If the probability measure \(\textrm{P}\) corresponds to randomly selecting an American, write all the percentages above as probabilities using proper notation.

- Find the probability that a randomly selected randomly is a Democrat who supports impeachment. Is this a joint, conditional, or marginal probability?
- Construct an appropriate two-way table.
- Find the probability that a randomly selected American supports impeachment. How does this differ from the average of the three percentages in part 1? Why?
- Now suppose that the randomly selected American supports impeachment. How does this information change the probability that the selected American belongs to a particular political party? Answer by computing appropriate probabilities (and using proper notation).
- How does each of the probabilities from the previous part compare to the respective prior probability? Does this make sense?

*Solution*to Example 2.56

- No, think of extreme cases as illustrations. If almost all of Americans were Democrats, then the overall probability of supporting impeachment would be close to 0.83, while if almost all of Americans were Republicans, then the overall probability of supporting impeachment would be close to 0.14. So the overall probability of supporting impeachment depends on the party affiliation breakdown.
- If the probability measure \(\textrm{P}\) corresponds to randomly selecting an American then
- \(\textrm{P}(A|D) = 0.83\)
- \(\textrm{P}(A|I) = 0.44\)
- \(\textrm{P}(A|R) = 0.14\)
- \(\textrm{P}(D) = 0.31\)
- \(\textrm{P}(I) = 0.40\)
- \(\textrm{P}(R) = 0.29\)

- The probability that a randomly selected randomly is a Democrat who supports impeachment is \(\textrm{P}(A \cap D) = \textrm{P}(A|D)\textrm{P}(D) = (0.83)(0.31) = 0.2573\), a joint probability. In 10000 hypothetical Americans, we would expect 3100 to be Democrats, and of those 3100 Democrats we would expect 2573 (or 83%) to support impeachment. So out of the 10000 Americans, 2573 are Democrats who support impeachment.
Continue in the manner of the previous part to complete a two-way table of counts for 10000 hypothetical Americans.

Impeach Not Impeach Total Democrat 2573 527 3100 Independent 1760 2240 4000 Republican 406 2494 2900 Total 4739 5261 10000 - Out of the 10000 Americans, 4739 support impeachment, so the probability that a randomly selected American supports impeachment
^{73}is \(\textrm{P}(A)=0.4739\). This is actually pretty close to the average of the 3 impeachment percentages, but that’s just a coincidence. The overall probability is actually a*weighted average*; in terms of the probabilities given in the setup, the table calculations show \[\begin{align*} \textrm{P}(A) & = \textrm{P}(A|D)\textrm{P}(D) + \textrm{P}(A|I)\textrm{P}(I) + \textrm{P}(A|R)\textrm{P}(R)\\ & = (0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29) \end{align*}\] - We want \(\textrm{P}(D|A)\), etc.
- \(\textrm{P}(D|A) = \frac{2573}{4739} = \frac{(0.83)(0.31)}{(0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29)} = 0.543\).
- \(\textrm{P}(I|A) = \frac{1760}{4739} = \frac{(0.44)(0.40)}{(0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29)} = 0.371\).
- \(\textrm{P}(R|A) = \frac{406}{4739} = \frac{(0.14)(0.29)}{(0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29)} = 0.086\).

- How does each of the probabilities from the previous part compare to the respective prior probability? Does this make sense?
- \(\textrm{P}(D|A) = 0.543\), which is greater than the prior probability of Democrat \(\textrm{P}(D) = 0.31\). Knowing the person supports impeachment increases the probability that the person is a Democrat.
- \(\textrm{P}(I|A) = 0.371\), which is slightly less than the prior probability of Independent \(\textrm{P}(D) = 0.40\). Knowing the person supports impeachment slightly decreases the probability that the person is an Independent.
- \(\textrm{P}(R|A) = 0.086\), which is greater than the prior probability of Republican \(\textrm{P}(R) = 0.29\). Knowing the person supports impeachment decreases the probability that the person is a Democrat.

A **mosaic plot** provides a nice visual of joint, marginal, and one-way conditional probabilities, and can be used to illustrate the law of total probability. The mosaic plot^{74} on the left in Figure 2.21 represents conditioning on political party. The vertical bars represent the probability of supporting/not supporting impeachment for each political party. The widths of the vertical bars are scaled in proportion to the marginal distribution of party; the bar for Independent is a little wider than the others. The area of each sub-rectangle represents a joint probability. The single bar to the right of the plot which displays the marginal probability of supporting/not supporting impeachment.

The plot on the right in 2.21 represents conditioning on support of impeachment. Now the widths of the vertical bars represent the distribution of supporting/not supporting impeachment, the heights within the bars represent conditional probabilities for party affiliation given support status, and the single bar to the right represents the marginal distribution of party affiliation.

**Example 2.57 **
Consider simulating a randomly selected American and determining whether or not the person supports impeachment and whether or not the person is a Democrat, as in the scenario in Example 2.56. Remember we are given \(\textrm{P}(A|D) = 0.83\), \(\textrm{P}(A|I) = 0.44\), \(\textrm{P}(A|R) = 0.14\), \(\textrm{P}(D) = 0.31\), \(\textrm{P}(I) = 0.40\), and \(\textrm{P}(R)=0.29\).

How could you perform one repetition of the simulation using spinners based solely on the probabilities provided in the problem, without constructing a two-way table? (Hint: you’ll need a few spinners, but you might not spin them all in a single repetition.)

*Solution*to Example 2.57

There will be 4 spinners, but only 2 will be spun in any single repetition.

- “Party” spinner: Areas of 0.31, 0.40, and 0.29 correspond to, respectively, Democrat, Independent, Republican. Spin this to determine party affiliation.
- “Impeachment” spinners — only one of the following will be spun in a single repetition:
- Impeachment given Democrat: areas of 0.83 and 0.17 corresponding to, respectively, support, not support. If the result of the “party” spinner is Democrat, spin this spinner to determine support for impeachment.
- Impeachment given Independent: areas of 0.44 and 0.56 corresponding to, respectively, support, not support. If the result of the “party” spinner is Independent, spin this spinner to determine support for impeachment.
- Impeachment given Republican: areas of 0.14 and 0.86 corresponding to, respectively, support, not support. If the result of the “party” spinner is Republican, spin this spinner to determine support for impeachment.

We can code the above in Symbulate by defining a custom probability space. An outcome is a (party, impeachment) pair. Each of the 4 spinners corresponds to a `BoxModel`

. We define a function that defines how to simulate one repetition, using the `draw`

method. Then we use that function to define a custom `ProbabilitySpace`

.

```
def party_impeachment_sim():
party = BoxModel(['D', 'I', 'R'], probs = [0.31, 0.40, 0.29]).draw()
if party == 'D':
support = BoxModel(['Imp', 'NotImp'], probs = [0.83, 0.17]).draw()
if party == 'I':
support = BoxModel(['Imp', 'NotImp'], probs = [0.44, 0.56]).draw()
if party == 'R':
support = BoxModel(['Imp', 'NotImp'], probs = [0.14, 0.86]).draw()
return party, support
P = ProbabilitySpace(party_impeachment_sim)
P.sim(10000).tabulate()
```

`## {('I', 'Imp'): 1770, ('D', 'Imp'): 2564, ('D', 'NotImp'): 535, ('R', 'Imp'): 406, ('I', 'NotImp'): 2218, ('R', 'NotImp'): 2507}`

The impeachment problem in this section illustrated a few concepts that we will explore in more detail in the next few sections, namely: the multiplication rule, the law of total probability, and Bayes’ rule.

### 3.1.3 Multiplication rule

**Multiplication rule:** the probability that two events both occur is
\[\begin{aligned}
\textrm{P}(A \cap B) & = \textrm{P}(A|B)\textrm{P}(B)\\
& = \textrm{P}(B|A)\textrm{P}(A)\end{aligned}\]

The above says that you should think “multiply” when you see “and”. However, be careful about *what* you are multiplying: you need an unconditional and an appropriate conditional probability. You can condition either on \(A\) or on \(B\), provided you have the appropriate marginal probability; often, conditioning one way is easier than the other. Be careful: the multiplication rule does *not* say that \(\textrm{P}(A\cap B)\) is the same as \(\textrm{P}(A)\textrm{P}(B)\).

Generically, the multiplication rule says \[ \text{joint} = \text{conditional}\times\text{marginal} \]

For example:

- 31%
*of Americans*are Democrats - 83.9%
*of Democrats*support impeachment - So 26%
*of Americans*are Democrats who support impeachment (0.26 = (0.31)(0.839)).

\[ \frac{\text{Democrats who support impeachment}}{\text{Americans}} = \left(\frac{\text{Democrats}}{\text{Americans}}\right)\left(\frac{\text{Democrats who support impeachment}}{\text{Democrats}}\right) \]

The multiplication rule extends naturally to more than two events. \[ \textrm{P}(A_1\cap A_2 \cap \cdots \cap A_{k}) = \textrm{P}(A_1)\textrm{P}(A_2|A_1)\textrm{P}(A_3|A_1\cap A_2) \times \cdots \times \textrm{P}(A_k|A_1 \cap A_2 \cap \cdots \cap A_{k-1}) \]

The multiplication rule is useful for computing probabilities for a random phenomenon that can be broken down into component “stages”.

**Example 2.58 **
The birthday problem concerns the probability that at least two people in a group of \(n\) people have the same birthday^{75}. In particular, how large does \(n\) need to be in order for this probability to be larger than 0.5? Ignore multiple births and February 29 and assume that the other 365 days are all equally likely.

- Explain how, in principle, you could perform a tactile simulation to estimate the probability that at least two people have the same birthday when \(n=30\).

- For \(n=30\), find the probability that none of the people have the same birthday.
- For \(n=30\), find the probability that at least two people have the same birthday.
- Provide an expression of the probability for a general \(n\) and find the smallest value of \(n\) for which the probability is over 0.5.

*Solution*to Example 2.58

- Here is one way.
- Get 365 cards and label each one with a distinct birthday.
- Shuffle the cards and select 30
*with replacement*. - Record whether or not you selected at least one card more than once. This corresponds to at least two people sharing a birthday.
- Repeat many times; each repetition consists of a selecting a sample of 30 cards with replacement.
- Find the proportion of repetitions on which at least two people had the same birthday to approximate the probability.

In the simulation below, the random variable \(X\) measures the number of distinct birthdays among the 30 people. So if no one shares a birthday then \(X=30\), if there is exactly one day that is the birthday of at least two people then \(X=29\), and so on.

Imagine lining the 30 people up in some order. Let \(A_2\) be the event that the first two people have different birthdays, \(A_3\) be the event that the first three people have different birthdays, and so on, until \(A_{30}\), the event that all 30 people have different birthdays. Notice \(A_{30}\subseteq A_{29} \subseteq \cdots \subseteq A_3 \subseteq A_2\), so \(\textrm{P}(A_{30}) = \textrm{P}(A_2 \cap A_3 \cap \cdots \cap A_{30})\).

The first person’s birthday can be any one of 365 days. In order for the second person’s birthday to be different, it needs to be on one of the remaining 364 days. So the probability that the second person’s birthday is different from the first is \(\textrm{P}(A_2)=\frac{364}{365}\).

Now if the first two people have different birthdays, in order for the third person’s birthday to be different it must be on one of the remaining 363 days. So \(\textrm{P}(A_3|A_2) = \frac{363}{365}\). Notice that this is a conditional probability. (If the first two people had the same birthday, then the probability that the third person’s birthday is different would be \(\frac{364}{365}\).)

If the first three people have different birthdays, in order for the fourth person’s birthday to be different it must be on one of the remaining 362 days. So \(\textrm{P}(A_4|A_2\cap A_3) = \frac{362}{365}\).

And so on. If the first 29 people have different birthdays, in order for the 30th person’s birthday to be different it must be on one of the remaining 365-29=336 days. Then using the multiplication rule

\[\begin{align*} \textrm{P}(A_{30}) & = \textrm{P}(A_{2}\cap A_3 \cap \cdots \cap A_{30})\\ & = \textrm{P}(A_2)\textrm{P}(A_3|A_2)\textrm{P}(A_4|A_2\cap A_3)\textrm{P}(A_5|A_2\cap A_3 \cap A_4)\cdots \textrm{P}(A_{30}|A_2\cap \cdots \cap A_{29})\\ & = \left(\frac{364}{365}\right)\left(\frac{363}{365}\right)\left(\frac{362}{365}\right)\left(\frac{361}{365}\right)\cdots \left(\frac{365-30 + 1}{365}\right)\approx 0.294 \end{align*}\]

By the complement rule, the probability that at least two people have the same birthday is \(1-0.294=0.706\), since either (1) none of the people have the same birthday, or (2) at least two of the people have the same birthday.

For a general \(n\), the probability that at least two people have the same birthday is \[ 1 - \prod_{k=1}^{n}\left(\frac{365-k+1}{365}\right) \] See the plot below, which plots this probability as a function of \(n\). When \(n=23\) this probability is 0.507.

```
def count_distinct_values(list):
return len(set(list))
n = 30
P = BoxModel(list(range(365)), size = n, replace = True)
X = RV(P, count_distinct_values)
x = X.sim(10000)
plt.figure()
x.plot()
plt.show()
```

`## 0.7095`

### 3.1.4 Law of total probability

The “overall” unconditional probability \(\textrm{P}(A)\) can be thought of as a weighted average of the “case-by-case” conditional probabilities \(\textrm{P}(A|B)\) and \(\textrm{P}(A|B^c)\), where the weights are determined by the likelihood of each case, \(\textrm{P}(B)\) versus \(\textrm{P}(B^c)\).
\[
\textrm{P}(A) = \textrm{P}(A|B)\textrm{P}(B) + \textrm{P}(A|B^c)\textrm{P}(B^c)
\]
This is an example of the *law of total probability*, which applies even when there are more than two cases.

**Law of total probability.** If \(B_1,\ldots, B_k\) are disjoint with \(B_1\cup \cdots \cup B_k=\Omega\), then
\[\begin{align*}
\textrm{P}(A) & = \sum_{i=1}^k \textrm{P}(A \cap B_i)\\
& = \sum_{i=1}^k \textrm{P}(A|B_i) \textrm{P}(B_i)
\end{align*}\]

The events \(B_1, \ldots, B_k\), which represent the “cases”, form a *partition* of the sample space; each outcome \(\omega\in\Omega\) lies in exactly one of the \(B_i\). The law of total probability says that we can interpret the unconditional probability \(\textrm{P}(A)\) as a probability-weighted average of the case-by-case conditional probabilities \(\textrm{P}(A|B_i)\) where the weights \(\textrm{P}(B_i)\) represent the probability of encountering each case.

For an illustration of the law of total probability, consider the single bar to the right of the plot on the left in Figure 2.21 which displays the marginal probability of supporting/not supporting impeachment. The height within this bar is the weighted average of the heights within the other bars, with the weights given by the widths of the other bars.

The plot on the right in 2.21 represents conditioning on support of impeachment. Now the widths of the vertical bars represent the distribution of supporting/not supporting impeachment, the heights within the bars represent conditional probabilities for party affiliation given support status, and the single bar to the right represents the marginal distribution of party affiliation. Now we can see that the marginal probabilities for part affiliation are the weighted averages of the respective conditional probabilities given support status.

Conditioning and using the law of probability is an effective strategy in solving many problems. For example, when a problem involves iterations or steps it is often useful to condition on the result of the first step.

**Example 2.59 **
You and your friend are playing the “lookaway challenge”.

In the first round, you point in one of four directions: up, down, left or right. At the exact same time, your friend also looks in one of those four directions. If your friend looks in the same direction you’re pointing, you win! Otherwise, you switch roles as the game continues to the next round — now your friend points in a direction and you try to look away. As long as no one wins, you keep switching off who points and who looks.

Suppose that each player is equally likely to point/look in each of the four directions, independently from round to round. What is the probability that you win the game?

- Why might you expect the probability to not be equal to 0.5?
- What is the probability that you win in the first round?
- If \(p\) denotes the probability that the player who goes first (you) wins, what is the probability that the other player wins?
- Condition on the result of the first round and set up an equation to solve for \(p\).

*Solution*to Example 2.59

- The player who plays first has the advantage of going first; that player can win the game in the first round, but cannot lose the game in the first round. So we might expect the player who goes first to be more likely to win than the other player.
- 1/4. If we represent an outcome in the first round as a pair (point, look) then there are 16 possible equally likely outcomes, of which 4 represent pointing and looking in the same direction. Alternatively, whichever direction you point, the probability that your friend looks in the same direction is 1/4.
- \(1-p\), since the game keeps going until someone wins.
- Here is where we use conditioning and the law of total probability. Let \(A\) be the event that you win the game, and \(B\) be the event that you win in the first round. By the law of total probability \[ \textrm{P}(A) = \textrm{P}(A|B)\textrm{P}(B) + \textrm{P}(A|B^c)\textrm{P}(B^c) \] Now \(\textrm{P}(A)=p\), \(\textrm{P}(B)=1/4\), \(\textrm{P}(B^c)=3/4\), and \(\textrm{P}(A|B)=1\) since if you win in the first round then you win the game. Now consider \(\textrm{P}(A|B^c)\). If you don’t win in the first round, it is like the game starts over with the other playing going first. In this scenario, you play the role of the player who does not go first, and we saw above that the probability that the player who does not go first wins is \(1-p\). That is, \(\textrm{P}(A|B^c) = 1-p\). Therefore \[ p = (1)(1/4)+ (1-p)(3/4) \] Solve to find \(p=4/7\approx 0.57\).

The following is one way to code the lookaway challenge. In each round an outcome is a (point, look) pair, coded with a `BoxModel`

with `size=2`

(and choices labeled 1, 2, 3, 4). The `** inf`

assumes the rounds continue indefinitely, so the outcome of a game is a sequence of (point, look) pairs for each round. The random variable \(X\) counts the number of rounds until there is a winner, which occurs in the first round that point = look. The player who goes first wins the game if the game ends in an odd number of rounds, so to estimate the probability that the player who goes first wins we find the proportion of repetitions in which \(X\) is an odd number.

```
def is_odd(x):
return (x % 2) == 1
def count_rounds(sequence):
for r, pair in enumerate(sequence):
if pair[0] == pair[1]:
return r + 1 # +1 for 0 indexing
P = BoxModel([1, 2, 3, 4], size = 2) ** inf
X = RV(P, count_rounds)
x = X.sim(10000)
plt.figure()
x.plot()
plt.show()
```

`## 0.5747`

### 3.1.5 Bayes rule

**Bayes’ rule** specifies how a prior probability \(\textrm{P}(B)\) is updated in response to information that event \(B\) has occurred to obtain the posterior probability \(\textrm{P}(B|A)\).
\[
\textrm{P}(B|A) = \textrm{P}(B)\left(\frac{\textrm{P}(A|B)}{\textrm{P}(A)}\right)
\]

Bayes’ rule is often used in conjunction with the law of total probability. If \(B_1,\ldots, B_k\) are disjoint with \(B_1\cup \cdots \cup B_k=\Omega\), then for any \(j\)

\[\begin{align*} \textrm{P}(B_j|A) & = \frac{\textrm{P}(A|B_j)\textrm{P}(B_j)}{\sum_{j=1}^k \textrm{P}(A|B_i) \textrm{P}(B_i)} \end{align*}\]

Each of the \(B_j\) represents a different “hypothesis” or “case”, while \(A\) represents “evidence”. So Bayes’ rule gives a way of updating \(\textrm{P}(B_j)\), the prior probability of hypothesis \(B_j\), in light of evidence \(A\) to obtain the posterior probability \(\textrm{P}(B_j|A)\).

**Example 2.60 **
A woman’s chances of giving birth to a child with Down syndrome increase
with age. The CDC estimates^{76} that a woman in her mid-to-late 30s has
a risk of conceiving a child with Down syndrome of about 1 in 250. A
nuchal translucency scan, which involves a blood draw from the mother
and an ultrasound, is often performed around the 13th week of pregnancy
to test for the presence of Down syndrome (among other things). If the
baby has Down syndrome, the probability that the test is positive is
about 0.9. However, when the baby does not have Down syndrome, there is
still a probability that the test returns a false positive of about^{77}
0.05. Suppose that the NT test for a pregnant woman in her mid-to-late
30s comes back positive for Down syndrome. What is the probability that
the baby actually has Down syndrome?

- Before proceeding, make a guess for the probability in question. \[ \text{0-20%} \qquad \text{20-40%} \qquad \text{40-60%} \qquad \text{60-80%} \qquad \text{80-100%} \]
- Donny Don’t says: 0.90 and 0.05 should add up to 1, so there must be a typo in the problem. Do you agree?
- Let \(D\) be the event that the baby has Down Syndrome, and let \(T\) be the event that the test is positive. Represent the probabilities provided using proper notation. Also, denote the probability that we are trying to find.
- Considering a hypothetical population of babies (of pregnant women in this demographic), express the probabilities as percents in context.
- Construct a hypothetical two-way table of counts.
- Use the table to find the probability in question.
- Using the probabilities provided in the setup, and without using the two-way table, find the probability that the test is positive.
- Using the probabilities provided in the setup, and without using the two-way table, find the probability that the baby actually has Down syndrome given that the test is positive.
- The probability in the previous part might seem very low to you. Explain why the probability is so low.
- Compare the probability of having Down Syndrome before and after the positive test. How much more likely is a baby who tests positive to have Down Syndrome than a baby for whom no information about the test is available?

*Solution*to Example 2.60

- We don’t know what you guessed, but from experience many people guess 80-100%. Afterall, the test is correct for most of the babies who have Down Syndrome, and also correct for the most of the babies who do not have Down Syndrome, so it seems like the test is correct most of the time. But this argument ignores one important piece of information that has a huge impact on the results: most babies do not have Down Syndrome.
- No, these probabilities apply to different groups: 0.9 to babies with Down Syndrome, and 0.05 to babies without Down Syndrome. Donny is using the complement rule wrong. For example, if 0.9 is the probability that a baby with Down Syndrome tests positive, then 0.1 is the probability that a baby with Down Syndrome
*does not test positive*; both probabilities apply to babies with Down Syndrome, and each baby with Down Syndrome either tests positive or not.

- If \(D\) is the event that the baby has Down Syndrome, and \(T\) is the event that the test is positive, then we are given
- \(\textrm{P}(D) = 1/250 = 0.004\)
- \(\textrm{P}(T|D) = 0.9\)
- \(\textrm{P}(T|D^c) = 0.05\)
- We want to find: \(\textrm{P}(D|T)\).

- Considering a hypothetical population of babies (of pregnant women in this demographic):
- 0.4%
*of babies*have Down Syndrome - 90%
*of babies with Down Syndrome*test positive - 5%
*of babies without Down Syndrome*test positive - We want to find the percentage
*of babies who test positive*that have Down Syndrome.

- 0.4%
Assuming 10000 babies (of pregnant women in this demographic)

Has Down Syndrome Not Down Syndrome Total Test positive 36 498 534 Not test positive 4 9462 9466 Total 40 9960 10000 - Among the 534 babies who test positive, 36 have Down Syndrome, so the probability that a baby who tests positive has Down Syndrome is 36/534 = 0.067.
- Use law of total probability: the probability of testing positive is the weighted average of 0.9 and 0.05; 0.05 gets more weight because there are many more babies without Down Syndrome \[ \textrm{P}(T) = \textrm{P}(T|D)\textrm{P}(D) + IP(T|D^c)\textrm{P}(D^c) = 0.9(0.004) + 0.05(0.996) = 0.0534 \]
- Use Bayes’ rule \[ \textrm{P}(D|T) = \frac{\textrm{P}(T|D)\textrm{P}(D)}{\textrm{P}(T)} = \frac{0.9(0.004)}{0.0534} = 0.067. \]
- The result says that only 6.7%
*of babies who test positive*actually have Down Syndrome. It is true that the test is correct for most babies with Down Syndrome (36 out of 40) and incorrect only for a small proportion of babies without Down Syndrome (498 out of 9960). But since so few babies have Down Syndrome, the sheer*number*of false positives (498) swamps the*number*of true positives (36). Prior to observing the test result, the prior probability that a baby has Down Syndrome is 0.004. The posterior probability that a baby has Down Syndrome given a positive test result is 0.067. A baby who tests positive is about 17 times (0.067/.004) more likely to have Down Syndrome than a baby for whom the test result is not known. So while 0.067 is still small in absolute terms, the posterior probability is much larger relative to the prior probability.

Remember, the conditional probability of \(A\) given \(B\), \(\textrm{P}(A|B)\), is not the same as the conditional probability of \(B\) given \(A\), \(\textrm{P}(B|A)\), and they can be vastly different. Remember to ask “percentage of what”? For example, the percentage of *babies who have Down syndrome* that test positive is a very different quantity than the percentage of *babies who test positive* that have Down syndrome.

Conditional probabilities (\(\textrm{P}(D|T)\)) can be highly influenced by the original unconditional probabilities (\(\textrm{P}(D)\)) of the events, sometimes called the **base rates**. Don’t neglect the base rates when evaluating probabilities. The example illustrates that when the base rate for a condition is very low and the test for the condition is less than perfect there will be a relatively high probability that a positive test is a *false positive.*

Recall Bayes rule for multiple cases. \[\begin{align*} \textrm{P}(B_j|A) & = \frac{\textrm{P}(A|B_j)\textrm{P}(B_j)}{\sum_{j=1}^k \textrm{P}(A|B_i) \textrm{P}(B_i)}\\ & = \frac{\textrm{P}(A|B_j)\textrm{P}(B_j)}{\textrm{P}(A)}\\ & \propto \textrm{P}(A|B_j)\textrm{P}(B_j) \end{align*}\]

Recall that \(B_j\) represents a “hypothesis” and \(A\) represents “evidence”. The probability \(\textrm{P}(A |B_j)\) is called the *likelihood*^{78} of observing evidence \(A\) given hypothesis \(B_j\).

The marginal probability of the evidence, \(\textrm{P}(A)\), in the denominator (which can be calculated using the law of total probability) simply normalizes the numerators to ensure that the updated probabilities \(\textrm{P}(B_j|A)\) sum to 1 when summing over all the cases. Thus Bayes rule says that the posterior probability \(\textrm{P}(B_j|A)\) is proportional to the product of the likelihood \(\textrm{P}(A | B_j)\) and the prior probability \(\textrm{P}(B_j)\) \[\begin{align*} \textrm{P}(B_j | A) & \propto \textrm{P}(A|B_j)\textrm{P}(B_j)\\ \text{posterior} & \propto \text{likelihood} \times \text{prior} \end{align*}\]

As an illustration, consider Example 2.56. Suppose we are given that the randomly selected person supports impeachment, and we want to update our probabilities for the person’s party affiliation. The following organizes the calculations in a *Bayes’ table* which illustrates “posterior is proportion to likelihood times prior”.

Prior | Likelihood (of supporting impeachment) | Prior \(\times\) Likelihood | Posterior | |
---|---|---|---|---|

Democrat | 0.31 | 0.83 | (0.31)(0.83) = 0.2573 | 0.2573/0.4739 = 0.543 |

Independent | 0.40 | 0.44 | (0.40)(0.44) = 0.1760 | 0.1760/0.4739 = 0.371 |

Republican | 0.29 | 0.14 | (0.29)(0.14) = 0.0406 | 0.406/0.4739 = 0.086 |

Sum | 1.00 | NA | 0.4739 | 1.000 |

The product of prior and likelihood for Democrats (0.2573) is 6.34 (0.2573/0.0406) times higher than the product of prior and likelihood for Republicans (0.0406). Therefore, Bayes rule implies that the conditional probability that the person is a Democrat given support for impeachment should be 6.34 times higher than the conditional probability that the person is a Republican given support for impeachment. Similarly, the conditional probability that the person is a Democrat given support for impeachment should be 1.46 (0.2573/0.1760) times higher than the conditional probability that the person is an Independent given support for impeachment, and the conditional probability that the person is an Independent given support for impeachment should be 4.33 (0.1760/0.0406) times higher than the conditional probability that the person is a Republican given support for impeachment. The last column just translates these relative relationships into probabilities that sum to 1.

### 3.1.6 Conditional probabilities are probabilities

The process of conditioning can be thought of as **“slicing and renormalizing”.**

- Extract the “slice” corresponding to the event being conditioned on (and discard the rest). For example, a slice might correspond to a particular row or column of a two-way table.

- “Renormalize” the values in the slice so that corresponding probabilities add up to 1.

We will see that the “slicing and renormalizing” interpretation also applies when dealing with *conditional distributions* of random variables, and corresponding plots. The following is a Venn diagram type example.

**Example 2.61 **
Each of the three Venn diagrams below represents a sample space with 16 equally likely outcomes. Let \(A\) be the yellow `/`

event, \(B\) the blue `\`

event, and their intersection \(A\cap B\) the green \(\times\) event. Suppose that areas represent probabilities, so that for example \(\textrm{P}(A) = 4/16\).

Find \(\textrm{P}(A|B)\) for each of the scenarios

*Solution*to Example 2.61

- Left: \(\textrm{P}(A|B)=0\). After conditioning on \(B\), there are now 4 equally likely outcomes, of which none satisfy \(A\).
- Middle: \(\textrm{P}(A|B) = 2/4\). After conditioning on \(B\), there are now 4 equally likely outcomes, of which 2 satisfy \(A\).
- Right: \(\textrm{P}(A|B) = 1/4\). After conditioning on \(B\), there are now 4 equally likely outcomes, of which 1 satisfies \(A\).

Conditioning on an event \(E\) can be viewed as a *change in the probability measure*^{79} on \(\Omega\), from \(\textrm{P}(\cdot)\) to \(\textrm{P}(\cdot|E)\). That is, the original probability measure \(\textrm{P}(\cdot)\) assigns probability \(\textrm{P}(A)\), a number, to event \(A\), while the conditional probability measure \(\textrm{P}(\cdot |E)\) assigns probability \(\textrm{P}(A|E)\), a possibly different number, to event \(A\). Switching to \(\textrm{P}(\cdot |E)\) resembles the following.

- Outcomes
^{80}in \(E^c\) are assigned probability 0 under \(\textrm{P}(\cdot|E)\). If \(A\) consists only of outcomes not in \(E\), i.e., if \(A\subseteq E^c\), then \(\textrm{P}(A\cap E)=0\) so \(\textrm{P}(A|E)=0\). - The probabilities of outcomes in \(E\) are rescaled so that they comprise 100% of the probability conditional on \(E\), i.e. so that \(\textrm{P}(E|E)=1\). This is the effect of dividing by \(\textrm{P}(E)\). For example, if \(A, B\subseteq E\) and \(\textrm{P}(A)=2\textrm{P}(B)\), then also \(\textrm{P}(A|E)=2\textrm{P}(B|E)\). That is, if event \(A\) is twice as likely as event \(B\) according to \(\textrm{P}(\cdot)\), then the same will be true according to \(\textrm{P}(\cdot|E)\) provided that the probabilities of none of the outcomes satisfying the events has been zeroed out due to conditioning on \(E\).

**Conditional probabilities are probabilities.** Given an event \(E\), the function \(\textrm{P}(\cdot|E)\) defines a valid probability measure. Analogous versions of probability rules hold for conditional probabilities.

- \(0 \le \textrm{P}(A|E) \le 1\) for any event \(A\).
- \(\textrm{P}(\Omega|E)=1\). Moreover, \(\textrm{P}(E|E) = 1\).
- If events \(A_1, A_2, \ldots\) are disjoint (i.e. \(A_i \cap A_j = \emptyset, i\neq j\)) then \[ \textrm{P}(A_1 \cup A_2 \cup \cdots |E) = \textrm{P}(A_1|E) + \textrm{P}(A_2|E) + \cdots \]
- \(\textrm{P}(A^c|E) = 1-\textrm{P}(A|E)\). (Be careful! Do not confuse \(\textrm{P}(A^c|E)\) with \(\textrm{P}(A|E^c)\).)

### 3.1.7 Conditional versus unconditional probability

**Example 2.62 **
Consider a group of 5 people: Harry, Bella, Frodo, Anakin, Katniss. Suppose each of their names is written on a slip of paper and the 5 slips of paper are placed into a hat. The papers are mixed up and 2 are pulled out, one after the other *without* replacement.

- What is the probability that Harry is the first name selected?
- What is the probability that Harry is the second name selected?
- If you were asked question (2) before question (1), would your answer change? Should it?
- If Bella is the first name selected, what is the probability that Harry is the second name selected?
- If Harry is the first name selected, what is the probability that Harry is the second name selected?
- How is the probability that Harry is the second name selected related to the probabilities in the two previous parts?
- If Bella is the second name selected, what is the probability that Harry was the first name selected?

*Solution*to Example 2.62

- The probability that Harry is the first name selected is 1/5, which is an answer we think most people would agree with. There are 5 names which are equally likely to be the first one selected, 1 of which is Harry.
- The probability that Harry is the second name selected is also 1/5. Many people might answer this as 1/4, since after selecting the first person there are now 4 names left. But we show and discuss below that the
*unconditional*probability is 1/5. - Your answer to question (2) certainly shouldn’t change depending on whether we ask question (1) first. But perhaps after seeing question (1) you are implicitly assuming that Harry has not been selected first? But there is nothing in question (2) that gives you any information about what happened on the first card.
- If Bella is the first name selected, the probability that Harry is the second name selected is 1/4. We think most people find this intuitive. If Bella is first, there are 4 cards remaining, equally likely to be the next card, of which 1 is Harry.
- If Harry is the first name selected, the probability that Harry is the second name selected is 0 since the cards are drawn
*without*replacement. - The probabilities in the two previous parts are
*conditional*probabilities. The probability in (2) is an*unconditional*probability. By the law of total probability, we know that the unconditional probability that Harry is the second name selected is the weighted average of the two conditional probabilities from the previous parts. Let \(A\) be the event that Harry is first, \(B\) be the event that Harry is second. So \(\textrm{P}(A) = 1/5\), \(\textrm{P}(B|A) = 0\), \(\textrm{P}(B|A^c) = 1/4\), and \[ \textrm{P}(B) = \textrm{P}(B|A)\textrm{P}(A) + \textrm{P}(B|A^c)\textrm{P}(A^c) = (0)(1/5) + (1/4)(4/5) = 1/5 \] Claiming that \(\textrm{P}(B)\) is 1/4 ignores the outcomes in which Harry is the first name selected. - If Bella is the second name selected, the probability that Harry was the first name selected is 1/4. It doesn’t really matter what is “first” and what is “second”, but rather the information conveyed. In (4), what’s important is that you know that one of the cards selected was Bella, so the probability that the other card selected is Harry is 1/4. But this part conveys the same information

Here is a two-way table of 1000 hypothetical draws; note that Harry is second in 200 of them.

Harry first | Harry not first | Total | |
---|---|---|---|

Harry second | 0 | 200 | 200 |

Harry not second | 200 | 600 | 800 |

Total | 200 | 800 | 1000 |

Be careful to distinguish between conditional and unconditional probabilities. A conditional probability reflects “new” information about the outcome of the random phenomenon. In the absence of such information, we must continue to account for all the possibilities. When computing probabilities, be sure to only reflect information that is known. Especially when considering a phenomenon that happens in stages, don’t assume that when considering “what happens second” that you know what happened first.

In the example above, imagine shuffling the five cards and putting two on a table face down. Now point to one of the cards and ask “what is the probability that THIS card is Harry?” Well, all you know is that this card is one of the five cards, each of the 5 cards is equally likely to be the one you’re pointing to, and only one of the cards is Harry. Should it matter whether the face down card you’re pointing to was the first or second card you laid on the table? No, the probability that THIS card is Harry should be 1/5, regardless of whether you put it down first or second.

Now turn over the other card that you’re not pointing to, and see what name is on it. The probability that the card you’re pointing to is Harry has now changed, because you have some information about the outcome of the shuffle. If the card you turned over says Harry, you know the probability that the card you’re pointing to is Harry is 0. If the card you turned over is not Harry, then you know that the probability that the card you’re pointing to is Harry is 1/4. It is not “first” or “second” that matters; it is whether or not you have obtained new information by revealing one of the cards.

Another way of asking the question is: Shuffle the five cards; what is the probability that Harry is the second card from the top? Without knowing any information about the result of the shuffle, all you know is that Harry should be equally likely to be in any one of the 5 positions, so the probability that he is the second card from the top should be 1/5. It is only after revealing information about the result of the shuffle, say the top card, that the probability that Harry is in the second position changes.

These number are estimates based on data from polls as of Oct 9↩

The resulting value is estimated based on data from polls as of Oct 9 and party affiliation as of Sept 2019.↩

For the purposes of constructing a hypothetical table, it doesn’t matter what value you use for the total, as long as you don’t round any of the counts in the interior cells. If interior cells are decimals, either leave them as decimals, or add a few zeros to the total count and redo.↩

Provided \(\textrm{P}(B)>0\). We will assume throughout that all events being conditioned on have non-zero probability. We will discuss some issues related to conditioning on the value of a continuous random variable later.↩

This number differs from the one in the previous impeachment problem because of rounding errors in the probabilities reported in the setups.↩

Unfortunately, mosaic plots are not available in Symbulate yet.↩

You should really click on this birthday problem link.↩

Source: http://www.cdc.gov/ncbddd/birthdefects/downsyndrome/data.html↩

Estimates of these probabilities vary between different sources. The values in the exercise were based on https://www.ncbi.nlm.nih.gov/pubmed/17350315↩

“Likelihood” here is used in the statistical sense, as in “maximum likelihood estimation”, rather than as a loose synonymn for the word probability.↩

Conditioning on event \(E\) can also be viewed as a restiction of the sample sample from \(\Omega\) to \(E\). However, we prefer to keep the sample space as \(\Omega\) and only view conditioning as a change in probability measure. In this way, we can consider conditioning on various events as representing different probability measures all defined for the same collection of events corresponding to the same sample space.↩

Remember: probabilities are assigned to events, so we are speaking loosely when we say probabilities of outcomes.↩