3.4 Interpreting conditioning
In this section we discuss some similarities between conditional and unconditional (marginal) probabilities.
3.4.1 Conditioning is “slicing and renormalizing”
The process of conditioning can be thought of as “slicing and renormalizing”.
- Extract the “slice” corresponding to the event being conditioned on (and discard the rest). For example, a slice might correspond to a particular row or column of a two-way table.
- “Renormalize” the values in the slice so that corresponding probabilities add up to 1.
We will see that the “slicing and renormalizing” interpretation also applies when dealing with conditional distributions of random variables, and corresponding plots. Slicing determines the shape; renormalizing determines the scale. Slicing determines relative probabilities; renormalizing just makes sure they add up to 1.
Example 3.10 Recall Example 2.32. Remember we are given \(\textrm{P}(A) = 0.49\), \(\textrm{P}(B) = 0.31\), and \(\textrm{P}(A\cap B) = 0.26\) where \(A\) is the event that the selected person supports impeachment and \(B\) is the event that the selected person is a Democrat.
- How many times more likely is it for an American to be a Democrat who supports impeachment than to be a Democrat who does not support impeachment?
- How many times more likely is it for a Democrat to support impeachment than to not support impeachment?
- What do you notice about the answers to the two previous parts?
Solution. to Example 3.10
Show/hide solution
- Note that the probability that an American is a Democrat who does not support impeachment is \(\textrm{P}(A^c \cap B) = \textrm{P}(B) - \textrm{P}(A\cap B) = 0.31 - 0.26 = 0.05\). The ratio in question is \(\frac{\textrm{P}(A \cap B)}{\textrm{P}(A^c \cap B)} = \frac{0.26}{0.05} = 5.2\). An American is 5.2 times more likely to be a Democrat who supports impeachment than to be a Democrat who does not support impeachment.
- Recall that \(\textrm{P}(A|B) = 0.839\) and \(\textrm{P}(A^c|B) = 0.161\). The ratio in question is \(\frac{\textrm{P}(A |B)}{\textrm{P}(A^c | B)} = \frac{0.839}{0.161} = 5.2\). A Democrat is 5.2 times more likely to support impeachment than to not support impeachment.
- The ratios are the same! Conditioning on Democrat just slices out the Americans who are Democrats. The ratios are determined by the overall probabilities for Americans, as in part 1. The conditional probabilities, given Democrat, in part 2 simply rescale the probabilities for Americans who are Democrats to add up to 1.
The following is a Venn diagram type example of slicing and renormalizing.
Example 3.11 Each of the three Venn diagrams below represents a sample space with 16 equally likely outcomes. Let \(A\) be the yellow /
event, \(B\) the blue \
event, and their intersection \(A\cap B\) the green \(\times\) event. Suppose that areas represent probabilities, so that for example \(\textrm{P}(A) = 4/16\).
Find \(\textrm{P}(A|B)\) for each of the scenarios. Be sure to indicate what represents the “slice” in each scenario.
Solution. to Example 3.11
Show/hide solution
In each case, the slice represents the 4 blue outcomes.
- Left: \(\textrm{P}(A|B)=0\). After conditioning on \(B\), there are now 4 equally likely outcomes, of which none satisfy \(A\).
- Middle: \(\textrm{P}(A|B) = 2/4\). After conditioning on \(B\), there are now 4 equally likely outcomes, of which 2 satisfy \(A\).
- Right: \(\textrm{P}(A|B) = 1/4\). After conditioning on \(B\), there are now 4 equally likely outcomes, of which 1 satisfies \(A\).
3.4.2 Conditional probabilities are probabilities
Conditioning on an event \(E\) can be viewed as a change in the probability measure94 on \(\Omega\), from \(\textrm{P}(\cdot)\) to \(\textrm{P}(\cdot|E)\). That is, the original probability measure \(\textrm{P}(\cdot)\) assigns probability \(\textrm{P}(A)\), a number, to event \(A\), while the conditional probability measure \(\textrm{P}(\cdot |E)\) assigns probability \(\textrm{P}(A|E)\), a possibly different number, to event \(A\). Switching to \(\textrm{P}(\cdot |E)\) resembles the following.
- Outcomes95 in \(E^c\) are assigned probability 0 under \(\textrm{P}(\cdot|E)\). If \(A\) consists only of outcomes not in \(E\), i.e., if \(A\subseteq E^c\), then \(\textrm{P}(A\cap E)=0\) so \(\textrm{P}(A|E)=0\).
- The probabilities of outcomes in \(E\) are rescaled so that they comprise 100% of the probability conditional on \(E\), i.e. so that \(\textrm{P}(E|E)=1\). This is the effect of dividing by \(\textrm{P}(E)\). For example, if \(A, B\subseteq E\) and \(\textrm{P}(A)=2\textrm{P}(B)\), then also \(\textrm{P}(A|E)=2\textrm{P}(B|E)\). That is, if event \(A\) is twice as likely as event \(B\) according to \(\textrm{P}(\cdot)\), then the same will be true according to \(\textrm{P}(\cdot|E)\) provided that the probabilities of none of the outcomes satisfying the events has been zeroed out due to conditioning on \(E\).
Conditional probabilities are probabilities. Given an event \(E\), the function \(\textrm{P}(\cdot|E)\) defines a valid probability measure. Analogous versions of probability rules hold for conditional probabilities.
- \(0 \le \textrm{P}(A|E) \le 1\) for any event \(A\).
- \(\textrm{P}(\Omega|E)=1\). Moreover, \(\textrm{P}(E|E) = 1\).
- If events \(A_1, A_2, \ldots\) are disjoint (i.e. \(A_i \cap A_j = \emptyset, i\neq j\)) then \[ \textrm{P}(A_1 \cup A_2 \cup \cdots |E) = \textrm{P}(A_1|E) + \textrm{P}(A_2|E) + \cdots \]
- \(\textrm{P}(A^c|E) = 1-\textrm{P}(A|E)\). (Be careful! Do not confuse \(\textrm{P}(A^c|E)\) with \(\textrm{P}(A|E^c)\).)
3.4.3 Conditional versus unconditional probability
Example 3.12 Consider a group of 5 people: Harry, Bella, Frodo, Anakin, Katniss. Suppose each of their names is written on a slip of paper and the 5 slips of paper are placed into a hat. The papers are mixed up and 2 are pulled out, one after the other without replacement.
- What is the probability that Harry is the first name selected?
- What is the probability that Harry is the second name selected?
- If you were asked question (2) before question (1), would your answer change? Should it?
- If Bella is the first name selected, what is the probability that Harry is the second name selected?
- If Harry is the first name selected, what is the probability that Harry is the second name selected?
- How is the probability that Harry is the second name selected related to the probabilities in the two previous parts?
- If Bella is the second name selected, what is the probability that Harry was the first name selected?
Solution. to Example 3.12
Show/hide solution
- The probability that Harry is the first name selected is 1/5, which is an answer we think most people would agree with. There are 5 names which are equally likely to be the first one selected, 1 of which is Harry.
- The probability that Harry is the second name selected is also 1/5. Many people might answer this as 1/4, since after selecting the first person there are now 4 names left. But we show and discuss below that the unconditional probability is 1/5.
- Your answer to question (2) certainly shouldn’t change depending on whether we ask question (1) first. But perhaps after seeing question (1) you are implicitly assuming that Harry has not been selected first? But there is nothing in question (2) that gives you any information about what happened on the first card.
- If Bella is the first name selected, the probability that Harry is the second name selected is 1/4. We think most people find this intuitive. If Bella is first, there are 4 cards remaining, equally likely to be the next card, of which 1 is Harry.
- If Harry is the first name selected, the probability that Harry is the second name selected is 0 since the cards are drawn without replacement.
- The probabilities in the two previous parts are conditional probabilities. The probability in (2) is an unconditional probability. By the law of total probability, we know that the unconditional probability that Harry is the second name selected is the weighted average of the two conditional probabilities from the previous parts. Let \(A\) be the event that Harry is first, \(B\) be the event that Harry is second. So \(\textrm{P}(A) = 1/5\), \(\textrm{P}(B|A) = 0\), \(\textrm{P}(B|A^c) = 1/4\), and \[ \textrm{P}(B) = \textrm{P}(B|A)\textrm{P}(A) + \textrm{P}(B|A^c)\textrm{P}(A^c) = (0)(1/5) + (1/4)(4/5) = 1/5 \] Claiming that \(\textrm{P}(B)\) is 1/4 ignores the outcomes in which Harry is the first name selected.
- If Bella is the second name selected, the probability that Harry was the first name selected is 1/4. It doesn’t really matter what is “first” and what is “second”, but rather the information conveyed. In (4), what’s important is that you know that one of the cards selected was Bella, so the probability that the other card selected is Harry is 1/4. But this part conveys the same information
Here is a two-way table of 1000 hypothetical draws; note that Harry is second in 200 of them.
Harry first | Harry not first | Total | |
---|---|---|---|
Harry second | 0 | 200 | 200 |
Harry not second | 200 | 600 | 800 |
Total | 200 | 800 | 1000 |
Be careful to distinguish between conditional and unconditional probabilities. A conditional probability reflects “new” information about the outcome of the random phenomenon. In the absence of such information, we must continue to account for all the possibilities. When computing probabilities, be sure to only reflect information that is known. Especially when considering a phenomenon that happens in stages, don’t assume that when considering “what happens second” that you know what happened first.
In the example above, imagine shuffling the five cards and putting two on a table face down. Now point to one of the cards and ask “what is the probability that THIS card is Harry?” Well, all you know is that this card is one of the five cards, each of the 5 cards is equally likely to be the one you’re pointing to, and only one of the cards is Harry. Should it matter whether the face down card you’re pointing to was the first or second card you laid on the table? No, the probability that THIS card is Harry should be 1/5, regardless of whether you put it down first or second.
Now turn over the other card that you’re not pointing to, and see what name is on it. The probability that the card you’re pointing to is Harry has now changed, because you have some information about the outcome of the shuffle. If the card you turned over says Harry, you know the probability that the card you’re pointing to is Harry is 0. If the card you turned over is not Harry, then you know that the probability that the card you’re pointing to is Harry is 1/4. It is not “first” or “second” that matters; it is whether or not you have obtained new information by revealing one of the cards.
Another way of asking the question is: Shuffle the five cards; what is the probability that Harry is the second card from the top? Without knowing any information about the result of the shuffle, all you know is that Harry should be equally likely to be in any one of the 5 positions, so the probability that he is the second card from the top should be 1/5. It is only after revealing information about the result of the shuffle, say the top card, that the probability that Harry is in the second position changes.
Conditioning on event \(E\) can also be viewed as a restiction of the sample sample from \(\Omega\) to \(E\). However, we prefer to keep the sample space as \(\Omega\) and only view conditioning as a change in probability measure. In this way, we can consider conditioning on various events as representing different probability measures all defined for the same collection of events corresponding to the same sample space.↩︎
Remember: probabilities are assigned to events, so we are speaking loosely when we say probabilities of outcomes.↩︎