6 Conditioning

Conditioning concerns how probabilities of events or distributions of random variables are influenced by information about the occurrence of events or the values of random variables.
A probability is a measure of the likelihood or degree of uncertainty of an event. A conditional probability revises this measure to reflect any “new” information about the outcome of the underlying random phenomenon.

Example 6.1

The probability¹ that a randomly selected American adult supports impeachment of President Trump is 0.49.

Suppose the randomly selected person is a Democrat. Do you think the probability that the randomly selected Democrat supports impeachment is 0.49?
The probability² that a randomly selected American is a Democrat is 0.31. Donny Don’t says that the probability that a randomly selected American both (1) is a Democrat, and (2) supports impeachment is equal to $0.49 \times 0.31$ . Do you agree?
Without further information, provide a range of “logically possible” values for the probability in the previous part. (“Logically possible” means they satisfy the rules of probability, even though they might not be realistic in context.)
Suppose that the probability that a randomly selected American both is a Democrat and supports impeachment is 0.26. Construct an appropriate two-way table of probabilities.
Construct a corresponding two-way table of hypothetical counts.
Find the probability³ that a randomly selected American who is a Democrat supports impeachment.
How can the probability in the previous part be written in terms of the probabilities provided in the setup?
Find the probability that a randomly selected American who supports impeachment is a Democrat.

The conditional probability of event $A$ given event $B$ , denoted $P (A | B)$ , is defined as

$P (A | B) = \frac{P (A \cap B)}{P (B)}$

The conditional probability $P (A | B)$ represents how the likelihood or degree of uncertainty of event $A$ should be updated to reflect information that event $B$ has occurred.
In general, knowing whether or not event $B$ occurs influences the probability of event $A$ . That is,

$In general, P (A | B) \neq P (A)$

Be careful: order is essential in conditioning. That is,

$In general, P (A | B) \neq P (B | A)$

6.1 Simulating conditional probabilities

Example 6.2

Consider simulating a randomly selected American and determining whether or not the person supports impeachment and whether or not the person is a Democrat, as in the scenario in Example Example 6.1. Remember we are given $P (A) = 0.49$ , $P (D) = 0.31$ , and $P (A \cap D) = 0.26$ where $A$ is the event that the selected person supports impeachment and $D$ is the event that the selected person is a Democrat.

Donny Don’t says we need two spinners: One spinner with areas of 0.49 and 0.51 to represent Support/Not support, and another spinner with areas of 0.31 and 0.69 to represent Democrat/Not Democrat. Then spin each spinner once to simulate one repetition. Do you agree?
How could you perform one repetition of the simulation using just a single spinner? (Hint: it needs 4 sectors.)
How could you perform a simulation, using the spinner in the previous part, to estimate $P (A | D)$ ?
What determines the order of magnitude of the the margin of error for your estimate in the previous part?
What is another method for performing the simulation and estimating $P (A | D)$ that has a smaller margin of error? What is the disadvantage of this method?

There are two basic ways to use simulation to approximate a conditional probability $P (A | B)$ .

Simulate the random phenomenon for a set number of repetitions (say 10000), discard those repetitions on which $B$ does not occur, and compute the relative frequency of $A$ among the remaining repetitions (on which $B$ does occur).
- Disadvantage: the margin of error is based on only the number of repetitions used to compute the relative frequency.
- Advantage: not computationally intensive.
Simulate the random phenomenon until obtaining a certain number of repetitions (say 10000) on which $B$ occurs, discarding those repetitions on which $B$ does not occur as you go, and compute the relative frequency of $A$ among the remaining repetitions (on which $B$ does occur).
- Advantage: the margin of error will be based on the set number of repetitions on which $B$ occurs.
- Disadvantage: inefficient, requires more time/computer power, especially if $P (B)$ is small

6.2 Joint, conditional, and marginal probabilities

Within the context of two events, we have joint, conditional, and marginal probabilities.
- Joint: unconditional probability involving both events, $P (A \cap B)$ .
- Conditional: conditional probability of one event given the other, $P (A | B)$ , $P (B | A)$ .
- Marginal: unconditional probability of a single event $P (A)$ , $P (B)$ .
The relationship $P (A | B) = P (A \cap B) / P (B)$ can be stated generically as

$conditional = \frac{joint}{marginal}$

In many problems conditional probabilities are provided or can be determined directly.

Example 6.3 Recent polls⁴ suggest that

83% of Democrats support impeachment of President Trump
44% of Independents support impeachment of President Trump
14% of Republicans support impeachment of President Trump

The average of these three percentages is $(83 + 44 + 14) / 3 = 47$ . Is it necessarily true that 47% of all Americans support impeachment?
Based on recent polls⁵
- 31% of Americans are Democrats
- 40% of Americans are Independent
- 29% of Americans are Republicans Define the event $A$ to represent “supports impeachment” and $D, I, R$ to correspond to affiliation in each of the parties. If the probability measure $P$ corresponds to randomly selecting an American, write all the percentages above as probabilities using proper notation.
Find the probability that a randomly selected American is a Democrat who supports impeachment. Is this a joint, conditional, or marginal probability?
Construct an appropriate two-way table.
Find the probability that a randomly selected American supports impeachment. How does this differ from the average of the three percentages in part 1? Why?
Now suppose that the randomly selected American supports impeachment. How does this information change the probability that the selected American belongs to a particular political party? Answer by computing appropriate probabilities (and using proper notation).
How does each of the probabilities from the previous part compare to the respective prior probability? Does this make sense?

Example 6.4

Consider simulating a randomly selected American and determining whether or not the person supports impeachment and whether or not the person is a Democrat, as in the scenario in Example Example 6.3. Remember we are given $P (A | D) = 0.83$ , $P (A | I) = 0.44$ , $P (A | R) = 0.14$ , $P (D) = 0.31$ , $P (I) = 0.40$ , and $P (R) = 0.29$ .

How could you perform one repetition of the simulation using spinners based solely on the probabilities provided in the problem, without first constructing a two-way table or finding $P (A \cap B)$ , etc? (Hint: you’ll need a few spinners, but you might not spin them all in a single repetition.)

6.3 Multiplication rule

Rearranging the definition of conditional probability we get the Multiplication rule: the probability that two events both occur is

$\begin{aligned} P (A \cap B) & = P (A | B) P (B) \\ = P (B | A) P (A) \end{aligned}$

The multiplication rule says that you should think “multiply” when you see “and”. However, be careful about what you are multiplying: to find a joint probability you need an unconditional and an appropriate conditional probability.
You can condition either on $A$ or on $B$ , provided you have the appropriate marginal probability; often, conditioning one way is easier than the other.
Be careful: the multiplication rule does not say that $P (A \cap B)$ is the same as $P (A) P (B)$ .

6.4 Conditioning is “slicing and renormalizing”

The process of conditioning can be thought of as “slicing and renormalizing”.
- Extract the “slice” corresponding to the event being conditioned on (and discard the rest). For example, a slice might correspond to a particular row or column of a two-way table.
- “Renormalize” the values in the slice so that corresponding probabilities add up to 1.
Slicing determines the shape; renormalizing determines the scale.
Slicing determines relative probabilities; renormalizing just makes sure they add up to 1.

Example 6.5

Recall Example Example 6.1. Remember we are given $P (A) = 0.49$ , $P (B) = 0.31$ , and $P (A \cap B) = 0.26$ where $A$ is the event that the selected person supports impeachment and $B$ is the event that the selected person is a Democrat.

How many times more likely is it for an American to be a Democrat who supports impeachment than to be a Democrat who does not support impeachment?
How many times more likely is it for a Democrat to support impeachment than to not support impeachment?
What do you notice about the answers to the two previous parts?

6.5 Independence

Example 6.6 Consider the following hypothetical data.

	Democrat ( $D$ )	Not Democrat ( $D^{c}$ )	Total
Loves puppies ( $L$ )	180	270	450
Does not love puppies ( $L^{c}$ )	20	30	50
Total	200	300	500

Suppose a person is randomly selected from this group. Consider the events $\begin{aligned} L & = {person loves puppies} \\ D & = {person is a Democrat} \end{aligned}$

Compute and interpret $P (L)$ .
Compute and interpret $P (L | D)$ .
Compute and interpret $P (L | D^{c})$ .
What do you notice about $P (L)$ , $P (L | D)$ , and $P (L | D^{c})$ ?
Compute and interpret $P (D)$ .
Compute and interpret $P (D | L)$ .
Compute and interpret $P (D | L^{c})$ .
What do you notice about $P (D)$ , $P (D | L)$ , and $P (D | L^{c})$ ?
Compute and interpret $P (D \cap L)$ .
What is the relationship between $P (D \cap L$ ) and $P (D)$ and $P (L)$ ?
When randomly selecting a person from this particular group, would you say that events $D$ and $L$ are independent? Why?

Events $A$ and $B$ are independent if the knowing whether or not one occurs does not change the probability of the other.
For events $A$ and $B$ (with $0 < P (A) < 1$ and $0 < P (B) < 1$ ) the following are equivalent. That is, if one is true then they all are true; if one is false, then they all are false.

$\begin{aligned} A and B & are independent \\ P (A \cap B) & = P (A) P (B) \\ P (A^{c} \cap B) & = P (A^{c}) P (B) \\ P (A \cap B^{c}) & = P (A) P (B^{c}) \\ P (A^{c} \cap B^{c}) & = P (A^{c}) P (B^{c}) \\ P (A | B) & = P (A) \\ P (A | B) & = P (A | B^{c}) \\ P (B | A) & = P (B) \\ P (B | A) & = P (B | A^{c}) \end{aligned}$

These number are estimates based on data from polls as of Oct 9, 2019. I wrote this exercise in Fall 2019. In Fall 2020, I decided not to change it, knowing that would make it outdated. But then Trump was impeached again in January 2021. And now we have the Jan 6 committee hearings.↩︎
Estimate as of Sept 2019.↩︎
The resulting value is estimated based on data from polls as of Oct 9, 2019 and party affiliation as of Sept 2019.↩︎
As of Oct 9, 2019 ↩︎
Party affiliation as of Sept 2019.↩︎