9  Conditioning

Example 9.1 Consider the sample space of all RC circuits that students build in their first circuits lab. Let \(E\) be the event that the resistor is the wrong size, and let \(C\) be the event that the capacitor is installed backwards. Randomly select a circuit, and suppose that \(\text{P}(E)=0.14\), \(\text{P}(C) = 0.33\), and \(\text{P}(E \cap C) = 0.05\).

  1. Construct a corresponding two-way table of hypothetical counts.




  2. Compute the probability that a randomly selected circuit has the wrong resistor given that its capacitor is installed backwards.




  3. How can the probability in the previous part be written in terms of the probabilities provided in the setup?




  4. Compute, and denote with proper notation, the probability that a randomly selected circuit has a backwards capacitor given that it has the wrong resistor installed. Is this equal to your previous answer?




  5. Interpret, and denote with proper notation, the probability that results from subtracting the answer to the previous part from 1.




9.1 Simulating conditional probabilities

There are two basic ways to use simulation to approximate a conditional probability \(\text{P}(A|B)\).

  • Simulate the random phenomenon for a set number of repetitions (say 10000), discard those repetitions on which \(B\) does not occur, and compute the relative frequency of \(A\) among the remaining repetitions (on which \(B\) does occur).
    • Disadvantage: the margin of error is based on only the number of repetitions used to compute the relative frequency.
    • Advantage: not computationally intensive.
  • Simulate the random phenomenon until obtaining a certain number of repetitions (say 10000) on which \(B\) occurs, discarding those repetitions on which \(B\) does not occur as you go, and compute the relative frequency of \(A\) among the remaining repetitions (on which \(B\) does occur).
    • Advantage: the margin of error will be based on the set number of repetitions on which \(B\) occurs.
    • Disadvantage: inefficient, requires more time/computer power, especially if \(\text{P}(B)\) is small

9.2 Joint, conditional, and marginal probabilities

  • Within the context of two events, we have joint, conditional, and marginal probabilities.
    • Joint: unconditional probability involving both events, \(\text{P}(A \cap B)\).
    • Conditional: conditional probability of one event given the other, \(\text{P}(A | B)\), \(\text{P}(B | A)\).
    • Marginal: unconditional probability of a single event \(\text{P}(A)\), \(\text{P}(B)\).
  • The relationship \(\text{P}(A|B) = \text{P}(A\cap B)/\text{P}(B)\) can be stated generically as \[ \text{conditional} = \frac{\text{joint}}{\text{marginal}} \]
  • In many problems conditional probabilities are provided or can be determined directly.

9.3 Multiplication rule

  • Rearranging the definition of conditional probability we get the Multiplication rule: the probability that two events both occur is \[ \begin{aligned} \text{P}(A \cap B) & = \text{P}(A|B)\text{P}(B)\\ & = \text{P}(B|A)\text{P}(A) \end{aligned} \]
  • The multiplication rule can be stated generically as \[ \text{joint} = \text{conditional}\times\text{marginal} \]
  • The multiplication rule says that you should think “multiply” when you see “and”. However, be careful about what you are multiplying: to find a joint probability you need a marginal (unconditional) probability (or distribution) and an appropriate conditional probability (or distribution).
  • You can condition either on \(A\) or on \(B\), provided you have the appropriate marginal probability; often, conditioning one way is easier than the other.
  • Be careful: the multiplication rule does not say that \(\text{P}(A\cap B)\) is the same as \(\text{P}(A)\text{P}(B)\).
  • The multiplication rule extends naturally to more than two events (but the notation gets messy): \[ \text{P}(A_1 \cap A_2 \cap A_3 \cap A_4 \cap \cdots) = \text{P}(A_1)\text{P}(A_2|A_1)\text{P}(A_3|A_1\cap A_2)\text{P}(A_4|A_1\cap A_2 \cap A_3)\cdots \]
  • The multiplication rule is useful for computing probabilities of events that can be broken down into component “stages” where conditional probabilities at each stage are readily available. At each stage, condition on the information about all previous stages.

Example 9.2 A standard deck of playing cards has 52 cards, 13 cards (2 through 10, jack, king, queen, ace) in each of 4 suits (hearts, diamonds, clubs, spades). Shuffle a deck and deals cards one at a time without replacement.

  1. Find the probability that the first card dealt is a heart.




  2. If the first card dealt is a heart, determine the conditional probability that the second card is a heart.




  3. Find the probability that the first two cards dealt are hearts.




  4. Find the probability that the first two cards dealt are hearts and the third card dealt is a diamond.




  5. Shuffle the deck and deal cards one at a time until an ace is dealt, and then stop. Find the probability that more than 4 cards are dealt. (Hint: consider the first 4 cards dealt.)




9.4 Conditioning is “slicing and renormalizing”

Example 9.3 Recall Example 9.1. Let \(E\) be the event that the resistor is the wrong size, and let \(C\) be the event that the capacitor is installed backwards. Randomly select a circuit, and suppose that \(\text{P}(E)=0.14\), \(\text{P}(C) = 0.33\), and \(\text{P}(E \cap C) = 0.05\).

  1. How many times more likely is it for a circuit to have a wrong resistor but not a backwards capacitor than to have a wrong resistor and a backwards capacitor?




  2. How many times more likely is it for a circuit that has a wrong resistor to not have a backwards capacitor than to have a backwards capacitor?




  3. What do you notice about the answers to the two previous parts?




  • The process of conditioning can be thought of as “slicing and renormalizing”.
    • Extract the “slice” corresponding to the event being conditioned on (and discard the rest). For example, a slice might correspond to a particular row or column of a two-way table.
    • “Renormalize” the values in the slice so that corresponding probabilities add up to 1.
  • Slicing determines the shape; renormalizing determines the scale.
  • Slicing determines relative probabilities; renormalizing just makes sure they add up to 1.

9.5 Independence

Example 9.4 Consider the following hypothetical data.

Democrat (\(D\)) Not Democrat (\(D^c\)) Total
Loves puppies (\(L\)) 180 270 450
Does not love puppies (\(L^c\)) 20 30 50
Total 200 300 500

Suppose a person is randomly selected from this group. Consider the events \[\begin{align*} L & = \{\text{person loves puppies}\}\\ D & = \{\text{person is a Democrat}\} \end{align*}\]

  1. Compute and interpret \(\text{P}(L)\).




  2. Compute and interpret \(\text{P}(L|D)\).




  3. Compute and interpret \(\text{P}(L|D^c)\).




  4. What do you notice about \(\text{P}(L)\), \(\text{P}(L|D)\), and \(\text{P}(L|D^c)\)?




  5. Compute and interpret \(\text{P}(D)\).




  6. Compute and interpret \(\text{P}(D|L)\).




  7. Compute and interpret \(\text{P}(D|L^c)\).




  8. What do you notice about \(\text{P}(D)\), \(\text{P}(D|L)\), and \(\text{P}(D|L^c)\)?




  9. Compute and interpret \(\text{P}(D \cap L)\).




  10. What is the relationship between \(\text{P}(D \cap L\)) and \(\text{P}(D)\) and \(\text{P}(L)\)?




  11. When randomly selecting a person from this particular group, would you say that events \(D\) and \(L\) are independent? Why?




  • Events \(A\) and \(B\) are independent if the knowing whether or not one occurs does not change the probability of the other.
  • For events \(A\) and \(B\) (with \(0<\text{P}(A)<1\) and \(0<\text{P}(B)<1\)) the following are equivalent. That is, if one is true then they all are true; if one is false, then they all are false.

\[\begin{align*} \text{$A$ and $B$} & \text{ are independent}\\ \text{P}(A \cap B) & = \text{P}(A)\text{P}(B)\\ \text{P}(A^c \cap B) & = \text{P}(A^c)\text{P}(B)\\ \text{P}(A \cap B^c) & = \text{P}(A)\text{P}(B^c)\\ \text{P}(A^c \cap B^c) & = \text{P}(A^c)\text{P}(B^c)\\ \text{P}(A|B) & = \text{P}(A)\\ \text{P}(A|B) & = \text{P}(A|B^c)\\ \text{P}(B|A) & = \text{P}(B)\\ \text{P}(B|A) & = \text{P}(B|A^c) \end{align*}\]