7.1 Conditional probability

A probability is a measure of the likelihood or degree of uncertainty of an event. A conditional probability revises this measure to reflect any “new” information about the outcome of the underlying random phenomenon.

Example 7.1 The probability114 that a randomly selected American adult supports impeachment of President Trump is 0.49.

  1. Suppose the randomly selected person is a Democrat. Do you think the probability that the randomly selected Democrat supports impeachment is 0.49?
  2. The probability115 that a randomly selected American is a Democrat is 0.31. Donny Don’t says that the probability that a randomly selected American both (1) is a Democrat, and (2) supports impeachment is equal to \(0.49\times 0.31\). Do you agree?
  3. Without further information, provide a range of “logically possible” values for the probability in the previous part. (“Logically possible” means they satisfy the rules of probability, even though they might not be realistic in context.)
  4. Suppose that the probability that a randomly selected American both is a Democrat and supports impeachment is 0.26. Construct an appropriate two-way table of probabilities.
  5. Construct a corresponding two-way table of hypothetical counts.
  6. Find the probability116 that a randomly selected American who is a Democrat supports impeachment.
  7. How can the probability in the previous part be written in terms of the probabilities provided in the setup?
  8. Find the probability that a randomly selected American who supports impeachment is a Democrat.

Solution. to Example 7.1

Show/hide solution
  1. The probability that the randomly selected Democrat supports impeachment is probably a lot larger than 0.49. Knowing that the selected person is a Democrat would change the probability of supporting impeachment.

  2. No. Consider a hypothetical set of 100 Americans. We would expect about 31 of these 100 Americans to be Democrats. However, we would not expect just 15 — that is, about half (\((0.49)(31)\approx 15\)) — of the 31 Democrats to support impeachment; we’d expect more, say 26 out of the 31. If 26 of the 100 Americans are Democrats who support impeachment, this would be consistent with a value of 0.26 for the probability that a randomly selected American both (1) is a Democrat, and (2) supports impeachment equal.

  3. We could make a table like in the following part and see what values produce valid tables. If \(A\) is the event that the selected person supports impeachment, and \(B\) is the event that the person is a Democrat, and \(\textrm{P}\) corresponds to randomly selecting an American, then \(\textrm{P}(A) = 0.49\) and \(\textrm{P}(B) = 0.31\). By the subset rule \(\textrm{P}(A\cap B)\le \min(\textrm{P}(A), \textrm{P}(B)) = 0.31\). The largest \(\textrm{P}(A\cap B)\) can be is 0.31, which corresponds to all Democrats supporting impeachment. In this case, the smallest \(\textrm{P}(A \cap B)\) can be is 0, which corresponds to no Democrats supporting impeachment. The extremes are not realistic, but without knowing more information, we do not know where \(\textrm{P}(A\cap B)\) lies in \(0\le \textrm{P}(A \cap B) \le 0.31\).

  4. If the probability that a randomly selected American both is a Democrat and supports impeachment is 0.26, then the two-way table of probabilities is

    \(A\) \(A^c\) Total
    \(B\) 0.26 0.05 0.31
    \(B^c\) 0.23 0.46 0.69
    Total 0.49 0.51 1.00
  5. It is often much easier to work with counts rather than probabilities. Start with a nice round total117 count like 10000 and then construct a table of hypothetical counts, assuming the counts follow the probabilities in the table above.

    Impeach Not Impeach Total
    Democrat 2600 500 3100
    Not Democrat 2300 4600 6900
    Total 4900 5100 10000
  6. Working with counts, there are 3100 Democrats, of which 2600 support impeachment, so \(2600/3100=0.839\) is the probability that a randomly selected American who is a Democrat supports impeachment.

  7. \(\frac{\textrm{P}(A\cap B)}{\textrm{P}(B)} = \frac{0.26}{0.31}=0.839\)

  8. There are 4900 Americans who support impeachment, of which 2600 are Democrats, so \(\frac{2600}{4900} = \frac{0.26}{0.49}=\frac{\textrm{P}(A\cap B)}{\textrm{P}(A)} =0.531\) is the probability that a randomly selected American who supports impeachment is a Democrat. Notice that this part and the previous part have the same numerator, \(\textrm{P}(A\cap B)\), but different denominators. Also notice that the probabilities are quite different in this part and the previous part.

Definition 7.1 The conditional probability of event \(A\) given event \(B\), denoted \(\textrm{P}(A|B)\), is defined as 118 \[ \textrm{P}(A|B) = \frac{\textrm{P}(A\cap B)}{\textrm{P}(B)} \]

The conditional probability \(\textrm{P}(A|B)\) represents how the likelihood or degree of uncertainty of event \(A\) should be updated to reflect information that event \(B\) has occurred. The unconditional probability \(\textrm{P}(A)\) is often called the prior probability (a.k.a., base rate) of \(A\) (prior to observing \(B\)). The conditional probability \(\textrm{P}(A|B)\) is the posterior probability of \(A\) after observing \(B\).

In general, knowing whether or not event \(B\) occurs influences the probability of event \(A\). That is, \[ \text{In general, } \textrm{P}(A|B) \neq \textrm{P}(A) \] For example, without knowing a person’s political party, the probability of supporting impeachment is 0.49, but after learning the person is a Democrat, the probability of supporting impeachment changed to 0.839.

Be careful: order is essential in conditioning. That is, \[ \text{In general, } \textrm{P}(A|B) \neq \textrm{P}(B|A) \]

Example 7.2 Which of the following is larger - 1 or 2?

  1. The probability that a randomly selected man who is greater than six feet tall plays in the NBA.
  2. The probability that a randomly selected man who plays in the NBA is greater than six feet tall.

Solution. to Example 7.2

Show/hide solution

The probability in (2) is much larger. The corresponding fractions would have the same numerator — number of men who are both greater than six feet tall and play in the NBA — but vastly different denominators.

  1. There are over a billion men in the world who are greater than six feet tall, only a few hundred of whom play in the NBA. The probability that a randomly selected man who is greater than six feet tall plays in the NBA is pretty close to 0.
  2. There only a few hundred men who play in the NBA, almost all of whom are greater than six feet tall. The probability that a randomly selected man who plays in the NBA is greater than six feet tall is pretty close to 1.

When dealing with probabilities, especially conditional probabilities, be sure to ask “probability of what?” That is, what is the appropriate sample space? Thinking in fraction terms, be sure to identify the total/baseline group which corresponds to the denominator. Be very careful when translating between numbers and words.

To emphasize, \(\textrm{P}(A|B)\) is not the same as \(\textrm{P}(B|A)\) and they can be vastly different. In particular, the conditional probabilities can be highly influenced by the original unconditional probabilities of the events, \(\textrm{P}(A)\) and \(\textrm{P}(B)\), sometimes called the base rates. Don’t neglect the base rates when evaluating probabilities.

For example, the probability that a randomly selected man plays in the NBA is pretty close to 0 (the base rate). Learning that the man is greater than six feet tall is not going to change much our probability that he plays in the NBA.

7.1.1 Simulating conditional probabilities

Example 7.3 Consider simulating a randomly selected American and determining whether or not the person supports impeachment and whether or not the person is a Democrat, as in the scenario in Example 7.1. Remember we are given \(\textrm{P}(A) = 0.49\), \(\textrm{P}(B) = 0.31\), and \(\textrm{P}(A\cap B) = 0.26\) where \(A\) is the event that the selected person supports impeachment and \(B\) is the event that the selected person is a Democrat.

  1. Donny Don’t says we need two spinners: One spinner with areas of 0.49 and 0.51 to represent Support/Not support, and another spinner with areas of 0.31 and 0.69 to represent Democrat/Not Democrat. Then spin each spinner once to simulate one repetition. Do you agree?
  2. How could you perform one repetition of the simulation using one spinner?
  3. How could you perform a simulation, using the spinner in the previous part, to estimate \(\textrm{P}(A | B)\)?
  4. What determines the order of magnitude of the the margin of error for your estimate in the previous part?
  5. What is another method for performing the simulation and estimating \(\textrm{P}(A |B)\) that has a smaller margin of error? What is the disadvantage of this method?

Solution. to Example 7.3

Show/hide solution
  1. No, this assumes there is no relationship between party and support. But we know that Democrats will be much more likely to support impeachment than non-Democrats. In general, you can not simulate pairs of events simply from the marginal distribution of each.
  2. You need to construct a spinner for the possible occurrences of the pairs of events and their joint probabilities. See Figure 7.1.
  3. The following method fixes the number of total spins, say 10000.
    • Spin the joint spinner from the previous part once to simulate a (party, support) pair.
    • Repeat a fixed number of times, say 10000.
    • Discard the repetitions on which the person was not a Democrat, that is, the repetitions on which \(B\) did not occur. You would expect to have around 3100 repetitions left.
    • Among the remaining repetitions (on which \(B\) occurred), count the number of repetitions on which \(A\) also occurred. So for the roughly 3100 repetitions for which the person was a Democrat, count the repetitions on which the person also supported impeachment; you would expect a count of around 2600.
    • Estimate \(\textrm{P}(A|B)\) by dividing the two previous counts. \[ \textrm{P}(A | B)\approx \frac{\text{Number of repetitions on which both $A$ and $B$ occurred}}{\text{Number of repetitions on which $B$ occurred}} \]
  4. Only those repetitions in which \(B\) occurred are used to estimate \(\textrm{P}(A|B)\). So the order of magnitude of the margin of error is determined by the number of repetitions on which \(B\) occurs. Roughly this would be around 3100, rather than 10000.
  5. The previous method simulated a fixed number of repetitions first, and then discarded the ones that did not meet the condition. We could instead discard repetitions that do not meet the condition as we go, and keep performing repetitions until we get a fixed number, say 10000, that do satisfy the condition. In this way, the estimate \(\textrm{P}(A |B)\) will be based on the fixed number of repetitions, say 10000, that satisfy event \(B\). The disadvantage is increased computational burden; we will need to simulate and discard many repetitions in order to achieve that the desired number that satisfy the condition.
Spinner corresponding to Example 7.3.

Figure 7.1: Spinner corresponding to Example 7.3.

There are two basic ways to use simulation to approximate a conditional probability \(\textrm{P}(A|B)\).

  • Simulate the random phenomenon for a set number of repetitions (say 10000), discard those repetitions on which \(B\) does not occur, and compute the relative frequency of \(A\) among the remaining repetitions (on which \(B\) does occur).
    • Disadvantage: the margin of error is based on only the number of repetitions used to compute the relative frequency. So if you perform 10000 repetitions but \(B\) occurs only on 2000, then the margin of error for estimate \(\textrm{P}(A|B)\) is roughly on the order of \(1/\sqrt{2000}\).
    • Advantage: not computationally intensive.
  • Simulate the random phenomenon until obtaining a certain number of repetitions (say 10000) on which \(B\) occurs, discarding those repetitions on which \(B\) does not occur as you go, and compute the relative frequency of \(A\) among the remaining repetitions (on which \(B\) does occur).
    • Advantage: the margin of error will be based on the set number of repetitions on which \(B\) occurs.
    • Disadvantage: requires more time/computer power. Especially if \(\textrm{P}(B)\) is small, it will require a large number of repetitions of the simulation to achieve the desired number of repetitions on which \(B\) occurs.

In Symulate, filter can be used to extract repetitions that satisfy a condition. The following simulates impeachment support status and party affiliation for 10000 hypothetical Americans and then applies filter to retain only the Democrats. The function is_Democrat takes as an imput a (support status, party affiliation pair) and returns True if Democrat (and False otherwise).


def is_Democrat(Support_Party):
    return Support_Party[1] == 'Democrat'


P = BoxModel([('Support', 'Democrat'), ('Support', 'Not Democrat'), ('Not Support', 'Democrat'), ('Not Support', 'Not Democrat')],
             probs = [0.26, 0.23, 0.05, 0.46])

P.sim(10000).filter(is_Democrat).tabulate()
Outcome Frequency
('Not Support', 'Democrat')515
('Support', 'Democrat')2609
Total3124

In Symbulate, the given symbol | applies the second method to simulate a fixed number of repetitions that satisfy the event being conditioned on. Be careful when using | when conditioning on an event with small probability. In particular, be careful when conditioning on the value of a continuous random variable.


P = BoxModel([('Support', 'Democrat'), ('Support', 'Not Democrat'), ('Not Support', 'Democrat'), ('Not Support', 'Not Democrat')],
             probs = [0.26, 0.23, 0.05, 0.46])
Support, Party = RV(P)

( (Support & Party) | (Party == 'Democrat') ).sim(10000).tabulate()
Value Frequency
(Not Support, Democrat)1612
(Support, Democrat)8388
Total10000

7.1.2 Joint, conditional, and marginal probabilities

Within the context of two events, we have joint, conditional, and marginal probabilities.

  • Joint: unconditional probability involving both events, \(\textrm{P}(A \cap B)\).
  • Conditional: conditional probability of one event given the other, \(\textrm{P}(A | B)\), \(\textrm{P}(B | A)\).
  • Marginal: unconditional probability of a single event \(\textrm{P}(A)\), \(\textrm{P}(B)\).

The relationship \(\textrm{P}(A|B) = \textrm{P}(A\cap B)/\textrm{P}(B)\) can be stated generically as \[ \text{conditional} = \frac{\text{joint}}{\text{marginal}} \]

In the previous impeachment problem, we were provided the joint and marginal probabilities and we computed conditional probabilities. But in many problems conditional probabilities are provided or can be determined directly.

Example 7.4 Recent polls119 suggest that

  • 83% of Democrats support impeachment of President Trump
  • 44% of Independents support impeachment of President Trump
  • 14% of Republicans support impeachment of President Trump
  1. The average of these three percentages is \((83+44+14)/3 = 47\). Is it necessarily true that 47% of all Americans support impeachment?

  2. Based on recent polls120

    • 31% of Americans are Democrats
    • 40% of Americans are Independent
    • 29% of Americans are Republicans

    Define the event \(A\) to represent “supports impeachment” and \(D, I, R\) to correspond to affiliation in each of the parties. If the probability measure \(\textrm{P}\) corresponds to randomly selecting an American, write all the percentages above as probabilities using proper notation.

  3. Find the probability that a randomly selected American is a Democrat who supports impeachment. Is this a joint, conditional, or marginal probability?

  4. Construct an appropriate two-way table.

  5. Find the probability that a randomly selected American supports impeachment. How does this differ from the average of the three percentages in part 1? Why?

  6. Now suppose that the randomly selected American supports impeachment. How does this information change the probability that the selected American belongs to a particular political party? Answer by computing appropriate probabilities (and using proper notation).

  7. How does each of the probabilities from the previous part compare to the respective prior probability? Does this make sense?

Solution. to Example 7.4

Show/hide solution
  1. No, think of extreme cases as illustrations. If almost all of Americans were Democrats, then the overall probability of supporting impeachment would be close to 0.83, while if almost all of Americans were Republicans, then the overall probability of supporting impeachment would be close to 0.14. So the overall probability of supporting impeachment depends on the party affiliation breakdown.

  2. If the probability measure \(\textrm{P}\) corresponds to randomly selecting an American then

    • \(\textrm{P}(A|D) = 0.83\)
    • \(\textrm{P}(A|I) = 0.44\)
    • \(\textrm{P}(A|R) = 0.14\)
    • \(\textrm{P}(D) = 0.31\)
    • \(\textrm{P}(I) = 0.40\)
    • \(\textrm{P}(R) = 0.29\)
  3. The probability that a randomly selected American is a Democrat who supports impeachment is \(\textrm{P}(A \cap D) = \textrm{P}(A|D)\textrm{P}(D) = (0.83)(0.31) = 0.2573\), a joint probability. In 10000 hypothetical Americans, we would expect 3100 to be Democrats, and of those 3100 Democrats we would expect 2573 (or 83%) to support impeachment. So out of the 10000 Americans, 2573 are Democrats who support impeachment.

  4. Continue in the manner of the previous part to complete a two-way table of counts for 10000 hypothetical Americans.

    Impeach Not Impeach Total
    Democrat 2573 527 3100
    Independent 1760 2240 4000
    Republican 406 2494 2900
    Total 4739 5261 10000
  5. Out of the 10000 Americans, 4739 support impeachment, so the probability that a randomly selected American supports impeachment121 is \(\textrm{P}(A)=0.4739\). This is actually pretty close to the average of the 3 impeachment percentages, but that’s just a coincidence. The overall probability is actually a weighted average; in terms of the probabilities given in the setup, the table calculations show \[\begin{align*} \textrm{P}(A) & = \textrm{P}(A \cap D) + \textrm{P}(A \cap I) + \textrm{P}(A \cap R)\\ & = \textrm{P}(A|D)\textrm{P}(D) + \textrm{P}(A|I)\textrm{P}(I) + \textrm{P}(A|R)\textrm{P}(R)\\ & = (0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29) \end{align*}\] This is an illustration of the “law of total probability” which we will discuss in more detail soon.

  6. We want \(\textrm{P}(D|A)\), etc.

    • \(\textrm{P}(D|A) = \frac{2573}{4739} = \frac{(0.83)(0.31)}{(0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29)} = 0.543\).
    • \(\textrm{P}(I|A) = \frac{1760}{4739} = \frac{(0.44)(0.40)}{(0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29)} = 0.371\).
    • \(\textrm{P}(R|A) = \frac{406}{4739} = \frac{(0.14)(0.29)}{(0.83)(0.31) + (0.44)(0.40) + (0.14)(0.29)} = 0.086\).
  7. How does each of the probabilities from the previous part compare to the respective prior probability? Does this make sense?

    • \(\textrm{P}(D|A) = 0.543\), which is greater than the prior probability of Democrat \(\textrm{P}(D) = 0.31\). Knowing the person supports impeachment increases the probability that the person is a Democrat.
    • \(\textrm{P}(I|A) = 0.371\), which is slightly less than the prior probability of Independent \(\textrm{P}(I) = 0.40\). Knowing the person supports impeachment slightly decreases the probability that the person is an Independent.
    • \(\textrm{P}(R|A) = 0.086\), which is less than the prior probability of Republican \(\textrm{P}(R) = 0.29\). Knowing the person supports impeachment decreases the probability that the person is a Republican.

A mosaic plot provides a nice visual of joint, marginal, and one-way conditional probabilities, and can be used to illustrate the law of total probability. The mosaic plot122 on the left in Figure 7.2 represents conditioning on political party. The vertical bars represent the conditional probabilities of supporting/not supporting impeachment for each political party. The widths of the vertical bars are scaled in proportion to the marginal distribution of party; the bar for Independent is a little wider than the others. The area of each sub-rectangle represents a joint probability. The single bar to the right of the plot displays the marginal probability of supporting/not supporting impeachment.

The plot on the right in Figure 7.2 represents conditioning on support of impeachment. Now the widths of the vertical bars represent the distribution of supporting/not supporting impeachment, the heights within the bars represent conditional probabilities for party affiliation given support status, and the single bar to the right represents the marginal distribution of party affiliation.

Mosaic plots for Example 7.4. The plot on the left represents conditioning on party affiliation, while the plot on the right represents conditioning on support for impeachment.Mosaic plots for Example 7.4. The plot on the left represents conditioning on party affiliation, while the plot on the right represents conditioning on support for impeachment.

Figure 7.2: Mosaic plots for Example 7.4. The plot on the left represents conditioning on party affiliation, while the plot on the right represents conditioning on support for impeachment.

Example 7.5 Consider simulating a randomly selected American and determining whether or not the person supports impeachment and whether or not the person is a Democrat, as in the scenario in Example 7.4. Remember we are given \(\textrm{P}(A|D) = 0.83\), \(\textrm{P}(A|I) = 0.44\), \(\textrm{P}(A|R) = 0.14\), \(\textrm{P}(D) = 0.31\), \(\textrm{P}(I) = 0.40\), and \(\textrm{P}(R)=0.29\).

How could you perform one repetition of the simulation using spinners based solely on the probabilities provided in the problem, without constructing a two-way table? (Hint: you’ll need a few spinners, but you might not spin them all in a single repetition.)

Solution. to Example 7.5

Show/hide solution

There will be 4 spinners, but only 2 will be spun in any single repetition.

  • “Party” spinner: Areas of 0.31, 0.40, and 0.29 correspond to, respectively, Democrat, Independent, Republican. Spin this to determine party affiliation.
  • “Impeachment” spinners — only one of the following will be spun in a single repetition:
    • Impeachment given Democrat: areas of 0.83 and 0.17 corresponding to, respectively, support, not support. If the result of the “party” spinner is Democrat, spin this spinner to determine support for impeachment.
    • Impeachment given Independent: areas of 0.44 and 0.56 corresponding to, respectively, support, not support. If the result of the “party” spinner is Independent, spin this spinner to determine support for impeachment.
    • Impeachment given Republican: areas of 0.14 and 0.86 corresponding to, respectively, support, not support. If the result of the “party” spinner is Republican, spin this spinner to determine support for impeachment.

We can code the above in Symbulate by defining a custom probability space. An outcome is a (party, impeachment) pair. Each of the 4 spinners corresponds to a BoxModel. We define a function that defines how to simulate one repetition, using the draw method. Then we use that function to define a custom ProbabilitySpace.


def party_impeachment_sim():
    party = BoxModel(['D', 'I', 'R'], probs = [0.31, 0.40, 0.29]).draw()
    if party == 'D':
        support = BoxModel(['Imp', 'NotImp'], probs = [0.83, 0.17]).draw()
    if party == 'I':
        support = BoxModel(['Imp', 'NotImp'], probs = [0.44, 0.56]).draw()
    if party == 'R':
        support = BoxModel(['Imp', 'NotImp'], probs = [0.14, 0.86]).draw()
    return party, support
    
P = ProbabilitySpace(party_impeachment_sim)
P.sim(10000).tabulate()
Outcome Frequency
('D', 'Imp')2530
('D', 'NotImp')531
('I', 'Imp')1750
('I', 'NotImp')2313
('R', 'Imp')432
('R', 'NotImp')2444
Total10000

  1. These number are estimates based on data from polls as of Oct 9, 2019. I wrote this exercise in Fall 2019. In Fall 2020, I decided not to change it, knowing that would make it outdated. But then Trump was impeached again in January 2021.↩︎

  2. Estimate as of Sept 2019.↩︎

  3. The resulting value is estimated based on data from polls as of Oct 9 and party affiliation as of Sept 2019.↩︎

  4. For the purposes of constructing a hypothetical table, it doesn’t matter what value you use for the total, as long as you don’t round any of the counts in the interior cells. If interior cells are decimals, either leave them as decimals, or add a few zeros to the total count and redo.↩︎

  5. Provided \(\textrm{P}(B)>0\). We will assume throughout that all events being conditioned on have non-zero probability. We will discuss some issues related to conditioning on the value of a continuous random variable later.↩︎

  6. As of Oct 9, 2019↩︎

  7. Party affiliation as of Sept 2019.↩︎

  8. This number differs from the one in the previous impeachment problem because of rounding errors in the probabilities reported in the setups.↩︎

  9. Unfortunately, mosaic plots are not available in Symbulate yet.↩︎