Chapter 4 Probability Concepts
In previous chapters, we discussed ways to describe variables and the relationships between them. From here, we want to start asking inferential statistics questions like “If my sample mean is 10, how likely is it that the population mean is actually 11?”. Probability is going to start us on this path.
Probability theory is the science of uncertainty and it is really interesting! But it can also be pretty challenging. I try to frame probability around things most of us can do at home: flipping a coin, rolling a die, drawing from a deck of cards. You certainly don’t need any of these things to get through this chapter, but you may find it helpful to have a coin/die/deck of cards on hand as you read through the examples.
Take your time running practice problems and going through the examples, using a tactile approach like sorting through your deck of cards whenever it seems helpful.
Chapter Learning Objectives/Outcomes
- Find and interpret probabilities for equally likely events.
- Find and interpret probabilities for events that are not equally likely.
- Find and interpret joint and marginal probabilities.
- Find and interpret conditional probabilities.
- Use the multiplication rule and independence to calculate probabilities.
R Objectives: none
This chapter’s outcomes correspond to course outcome (3) understand the basic rules of probability.
4.1 Experiments, Sample Spaces, and Events
Probability is the science of uncertainty. When we run an experiment, we are unsure of what the outcome will be. Because of this uncertainty, we say an experiment is a random process.
The probability of an event is the proportion of times it would occur if the experiment were run infinitely many times. For a collection of equally likely events, this looks like: \[ \text{probability of event} = \frac{\text{number of ways event can occur}}{\text{number of possible (unique) outcomes}} \]
An event is some specified possible outcome (or collection of outcomes) we are interested in observing.
Example: If you want to roll a 6 on a six-sided die, there are six possible outcomes \(\{1,2,3,4,5,6\}\). In general, we assume that each die face is equally likely to appear on a single roll of the die, that is, that the die is fair. So the probability of rolling a 6 is \[\frac{\text{number of ways to roll a 6}}{\text{number of possible rolls}} = \frac{1}{6}\]
Example: We can extend this to a collection of events, say the probability of rolling a 5 or a 6: \[\frac{\text{number of ways to roll a 5 or 6}}{\text{number of possible rolls}} = \frac{2}{6}\]
The collection of all possible outcomes is called a sample space, denoted \(S\). For the six-sided die, \(S=\{1,2,3,4,5,6\}\).
To simplify our writing, we use probability notation:
- Events are assigned capital letters.
- \(P(A)\) denotes the probability of event \(A\).
- Sometimes we will also shorten simple events to just a number. For example, \(P(1)\) might represent “the probability of rolling a 1”.
We can estimate probabilities from a sample using a frequency distribution.
Example: Consider the following frequency distribution from Chapter 1
Class Frequency freshman 12 sophomore 10 junior 3 senior 5 If a student is selected at random (meaning each student is equally likely to be selected), the probability of selecting a sophomore is \[\text{probability of sophomore} = \frac{\text{number of ways to select a sophomore}}{\text{total number of students}} = \frac{10}{30} \approx 0.3333\] The probability of selecting a junior or a senior is \[\frac{\text{number of ways to select a junior or senior}}{\text{total number of students}} = \frac{3+5}{30} = \frac{8}{30} \approx 0.2667\]
Using probability notation, we might let \(A\) be the event we selected a junior and \(B\) be the event we selected a senior. Then \[P(A \text{ or } B) = 0.2667\]
Section Exercises
- Suppose you’re playing a game and need to roll a 17 or higher on a 20-sided die for your next action to be successful.
- What is the sample space?
- What is the probability of rolling a 17 or higher?
- Consider again frequency distribution from Chapter 1
Class | Frequency |
---|---|
freshman | 12 |
sophomore | 10 |
junior | 3 |
senior | 5 |
- For a student selected at random from this course, what is the probability they are a senior?
- What is the probability they are not a senior?
4.2 Probability Distributions
Two outcomes are disjoint or mutually exclusive if they cannot both happen (at the same time). Think back to how we developed bins for histograms - the bins need to be nonoverlapping - this is the same idea!
Example: If I roll a six-sided die one time, rolling a 5 and rolling a 6 are disjoint. I can get a 5 or a 6, but not both on the same roll.
Example: If I select a student, they can be a freshman or a sophomore, but that student cannot be both a freshman and a sophomore at the same time.
The outcome must be one event or the other (it cannot be both at the same time).
4.2.1 Venn Diagrams
Venn Diagrams show events as circles. The circles overlap where events share common outcomes.
When a Venn Diagram has no overlap the events are mutually exclusive. This Venn Diagram shows the event “Draw a Diamond” and the event “Draw a Face Card”. There are 13 diamonds and 12 face cards in a deck. In this case, the events are not mutually exclusive: it’s possible to draw both a diamond and a face card at the same time: the Jack of Diamonds, Queen of Diamonds, and King of Diamonds.
The “face cards” are the Jacks, Queens, and Kings. Each row represents a “suit”. From top to bottom, the suits are clubs, spades, hearts, and diamonds. Cards can be either red (hearts and diamonds) or black (spades and clubs).
4.2.2 Probability Axioms
A probability distribution lists all possible disjoint outcomes (think: all possible values of a variable) and their associated probabilities. This can be in the form of a table
Roll of a six-sided die | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
Probability | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |
(note that we could visualize this with a bar plot!) or an equation, which we will discuss in a later chapter.
The probability axioms are requirements for a valid probability distribution. They are:
- All listed outcomes must be disjoint.
- Each probability must be between 0 and 1.
- The probabilities must sum to 1.
Note that #2 is true for ALL probabilities. If you ever calculate a probability and get a negative number or a number greater than 1, you know something went wrong!
Example: Use the probability axioms to check whether the following tables are probability distributions.
X {1 or 2} {3 or 4} {5 or 6} P(X) 1/3 1/3 1/3 Each axiom is satisfied, so this is a valid probability distribution.
Y {1 or 2} {2 or 3} {3 or 4} {5 or 6} P(Y) 1/3 1/3 1/3 -1/3 In this case, the outcomes are not disjoint and one of the probabilities is negative, so this is not a valid probability distribution.
Probability distributions look a lot like relative frequency distributions. This isn’t a coincidence! In fact, a relative frequency distribution is a good way to use data to approximate a probability distribution.
Section Exercises
- Consider events \(A\): “Draw a spade”, \(B\): “Draw a queen”, and \(C\): “Draw a red”. Which of these events are mutually exclusive?
- Use the probability axioms to determine whether each of the following is a valid probability distribution:
x 0 1 2 3 P(x) 0.1 0.2 0.1 0.3 x 0 or 1 1 or 2 3 or 4 5 or 6 P(x) 0.1 0.2 0.4 0.3 - Determine whether the following events are mutually exclusive (disjoint).
- Your friend studies in the library. You study at home.
- You and your study group all earn As on an exam.
- You stay out until 3 am. You go to bed at 9 pm.
- In a group of 24 people, 13 have cats and 15 have dogs. Four of them have both cats and dogs. Sketch a Venn Diagram for these events.
4.3 Rules of Probability
Consider a six-sided die. \[P(\text{roll a 1 or 2}) = \frac{\text{2 ways}}{\text{6 outcomes}} = \frac{1}{3}.\] Notice that we get the same result by taking \[P(\text{roll a 1})+P(\text{roll a 2}) = \frac{1}{6}+\frac{1}{6} = \frac{1}{3}.\] It turns out this is widely applicable!
4.3.1 Addition Rules
If \(A_1\) and \(A_2\) are disjoint outcomes, then the probability that one of them occurs is \[P(A_1 \text{ or } A_2) = P(A_1)+P(A_2).\] This can also be extended to more than two disjoint outcomes: \[P(A_1 \text{ or } A_2 \text{ or } \dots \text{ or } A_k) = P(A_1)+P(A_2)+\dots + P(A_k)\] for \(k\) disjoint outcomes.
Example:
Class Frequency freshman 12 sophomore 10 junior 3 senior 5 In a previous example, we found that \[P(\text{junior or senior}) = \frac{3+5}{30} = \frac{8}{30}\]
Using the Addition Rule for Disjoint Outcomes, we get \[P(\text{junior})+P(\text{senior}) = \frac{3}{30} + \frac{5}{30} = \frac{8}{30}\]
Essentially, the Addition Rule for Disjoint Outcomes is just breaking up that fraction: \(\frac{3+5}{30}\) (3 juniors plus 5 seniors out of 30 students) represents the same thing as \(\frac{3}{30} + \frac{5}{30}\) (3 juniors out of 30 students plus 5 seniors out of 30 students).
Now consider a deck of cards. Let \(A\) be the event that a card drawn is a diamond and let \(B\) be the event it is a face card. (Check back to 3.2 for the Venn Diagram of these events.)
- \(A\): \(\quad 2\diamondsuit\) \(3\diamondsuit\) \(4\diamondsuit\) \(5\diamondsuit\) \(6\diamondsuit\) \(7\diamondsuit\) \(8\diamondsuit\) \(9\diamondsuit\) \(10\diamondsuit\) \(J\diamondsuit\) \(Q\diamondsuit\) \(K\diamondsuit\) \(A\diamondsuit\)
- \(B\): \(\quad J\heartsuit\) \(Q\heartsuit\) \(K\heartsuit\) \(J\clubsuit\) \(Q\clubsuit\) \(K\clubsuit\) \(J\diamondsuit\) \(Q\diamondsuit\) \(K\diamondsuit\) \(J\spadesuit\) \(Q\spadesuit\) \(K\spadesuit\)
The collection of cards that are diamonds or face cards (or both) is
Looking at these cards, I can see that there are 22 of them, so \[P(A \text{ or } B) = \frac{22}{52}\]
However, if I try to apply the addition rule for disjoint outcomes, \(P(A)=\frac{13}{52}\) and \(P(B)=\frac{12}{52}\) and I would get \(\frac{13+15}{52} = \frac{25}{52}\), which isn’t what we want!
What happened? When I tried to add these, I double counted the Jack of Diamonds, Queen of Diamonds, and King of Diamonds (the cards that are in both \(A\) and \(B\)). To deal with that, I need to subtract off the double count \(\frac{13}{52}+\frac{12}{52}-\frac{3}{52}\).
For any two events \(A\) and \(B\), the probability that at least one will occur is \[P(A \text{ or } B) = P(A)+P(B)-P(A \text{ and }B).\]
Notice that when we say “or”, we include the situations where A is true, B is true, and the situation where are both A and B are true. This is an inclusive or. Basically, if I said “Do you like cats or dogs?” and you said “Yes.” because you like cats and dogs, that would be a perfectly valid response. I recommend using the inclusive or with your friends any time you want to get out of making a decision.
Also notice that the general addition rule applies to any two events, even disjoint events. This is because, for disjoint events, \(P(A \text{ and } B) = 0\); it’s impossible for both to occur at the same time!
4.3.2 Complements
The complement of an event is all of the outcomes in the sample space that are not in the event. For an event \(A\), we denote its complement by \(A^c\).
Example: For a single roll of a six-sided die, the sample space is all possible rolls: 1, 2, 3, 4, 5, or 6. If the event \(A\) is rolling a 1 or a 2, then the complement of this event, denoted \(A^c\), is rolling a 3, 4, 5, or 6.
We could also write this in probability notation: \(S = \{1, 2, 3, 4, 5, 6\}\) and if \(A=\{1,2\}\), then \(A^c=\{3, 4, 5, 6\}\).
Property: \[P(A \text{ or } A^c)=1\] Using the addition rule, \[P(A \text{ or } A^c) = P(A)+P(A^c) = 1.\] Make sure you can convince yourself that \(A\) and \(A^c\) are always disjoint.
\[P(A) = 1-P(A^c).\]
Example: Consider rolling 2 six-sided dice and taking their sum. The event of interest is a sum less than 12. Find
- \(A^c\)
- \(P(A^c)\)
- \(P(A)\)
If \(A =\) (sum less than 12), then \(A^c =\) (sum greater than or equal to 12). Take a moment to notice that there is only one way to get a sum greater than or equal to 12: rolling two 6s.
The chart below shows the rolls of Die 1 as columns and the rolls for Die 2 as rows. The numbers in the middle are the sums. Note that there are 36 possible ways to roll 2 dice.
1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Even without the chart, by noting that there’s only one way to get a sum greater than or equal to 12, we can quickly find \(P(A^C)\): \[ P(A^c) = \frac{1}{36}\] But trying to count all of the ways to get \(A\) would take a long time! Instead, we can use \[P(A) = 1 - P(A^c) = 1-\frac{1}{36} = \frac{35}{36}\]
Section Exercises
- Consider rolling two four-sided dice and taking their sum.
- Find the probability that the sum is 7 or 8.
- Find the probability that the sum is less than 8.
- Let \(A\) be the event that the sum is 8.
- Find \(P(A)\).
- Describe the event \(A^c\)?
- Find \(P(A^c)\).
- Consider a deck of cards (there’s an image in Section 4.2.1). If you randomly draw a single card from the deck, what is the probability that it is a Queen or a Heart?
- Suppose you are playing a game that requires you to roll twenty-sided dice. We are interested in the setting where you roll two of these dice (call them red and blue) and take the highest of the two rolls. We want to find the probability that your highest roll is at least a \(17\).
- Let \(A\) be the event that red is at least a \(17\). Then, find \(P(A)\).
- Let \(B\) be the probability that blue is at least \(17\). What is \(P(B)\)?
- How can we rewrite “the probability that your highest roll is at least a \(17\)” in terms of \(A\) and \(B\)? Hint: What needs to happen with red and blue for your highest roll to be at least a 17?
- Find the probability that your highest roll is at least a 17.
4.4 Conditional Probability
A contingency table is a way to summarize bivariate data, or data from two variables.
Smallpox in Boston (1726)
|
Inoculated |
|||
|
yes |
no |
total |
|
Result |
lived |
238 |
5136 |
5374 |
died |
6 |
844 |
850 |
|
total |
244 |
5980 |
6224 |
- 5136 is the count of people who lived AND were not inoculated.
- 6224 is the total number of observations.
- 244 is the total number of people who were inoculated.
- 5374 is the total number of people who lived.
This is basically a two-variable frequency distribution. And, like a frequency distribution, we can convert to proportions (relative frequencies) by dividing each count (each number) by the total number of observations:
|
Inoculated |
|||
|
yes |
no |
total |
|
Result |
lived |
0.0382 |
0.8252 |
0.8634 |
died |
0.0010 |
0.1356 |
0.1366 |
|
total |
0.0392 |
0.9608 |
1.0000 |
- 0.8252 is the proportion of people who lived AND were not inoculated.
- 1.000 is the proportion of total number of observations. Think of this as 100% of the observations.
- 0.0392 is the proportion of people who were inoculated.
- 0.8634 is the proportion of people who lived.
The row and column totals are marginal probabilities. The probability of two events together (\(A\) and \(B\)) is a joint probability.
What can we learn about the result of smallpox if we already know something about inoculation status? For example, given that a person is inoculated, what is the probability of death? To figure this out, we restrict our attention to the 244 inoculated cases. Of these, 6 died. So the probability is 6/244.
This is called conditional probability, the probability of some event \(A\) if we know that event \(B\) occurred (or is true): \[P(A|B) = \frac{P(A\text{ and }B)}{P(B)}\] where the symbol | is read as “given”.
Example: For death given inoculation, \[P(\text{death}|\text{inoculation}) = \frac{P(\text{death and inoculation})}{P(\text{inoculation})} = \frac{0.0010}{0.0392} = 0.0255.\] Notice that we could also write this as \[P(\text{death}|\text{inoculation}) = \frac{P(\text{death and inoculation})}{P(\text{inoculation})} = \frac{6/6224}{244/6224} = \frac{6}{244},\] which is what we found when using the table to restrict our attention to only the inoculated cases.
If knowing whether event \(B\) occurs tells us nothing about event \(A\), the events are independent. For example, if we know that the first flip of a (fair) coin came up heads, that doesn’t tell us anything about what will happen next time we flip that coin.
We can test for independence by checking if \(P(A|B)=P(A)\).
4.4.1 Multiplication Rules
If \(A\) and \(B\) are independent events, then \[P(A \text{ and }B) = P(A)P(B).\]
We can extend this to more than two events: \[P(A \text{ and }B \text{ and } C \text{ and } \dots) = P(A)P(B)P(C)\dots.\]
Note that if \(P(A \text{ and }B) \ne P(A)P(B)\), then \(A\) and \(B\) are not independent.
Example: Find the probability of rolling a \(6\) on your first roll of a die and a \(6\) on your second roll.
Let \(A=\) (rolling a \(6\) on first roll) and \(B=\) (rolling a \(6\) on second roll). For each roll, the probabiltiy of getting a \(6\) is \(1/6\), so \(P(A) = \frac{1}{6}\) and \(P(B) = \frac{1}{6}\).
Then, because each roll is independent of any other rolls, \[P(A \text{ and }B) = P(A)P(B) = \frac{1}{6}\times\frac{1}{6} = \frac{1}{36}\]
If \(A\) and \(B\) are any two events, then \[P(A \text{ and }B) = P(A|B)P(B).\]
Notice that this is just the conditional probability formula, rewritten in terms of \(P(A \text{ and }B)\)!
Example: Suppose we know that 38.4% of US households have dogs and that among those with dogs, 23.1% have cats. Find the probability that a US household has both dogs and cats.
Let \(C=\) (household has cats) and \(D=\) (household has dogs). We know from the problem statement that \(P(D) = 0.384\).
The other piece tells us something about the probability of having cats among those with dogs. This means that we know that these people have dogs. That is, given a household has dogs, the probabiltiy of cats is 23.1%. In probability notation, \(P(C|D) = 0.231\). Then \[P(C \text{ and }D) = P(C|D)P(D) = 0.231\times 0.384 = 0.0887\] or the probability that a US household has both cats and dogs is 0.0887.
We can put our idea of independence together with our multiplication rules to come up with three ways to test for independence:
- \(P(A|B)=P(A)\)
- \(P(B|A)=P(B)\)
- \(P(A \text{ and }B) = P(A)P(B)\)
These are all mathematically equivalent, so you only need to check one. If the equation holds (they are equal), then \(A\) and \(B\) are independent. If they are not equal, the events are dependent.
Section Exercises
- For the following conditional probability scenarios, determine the condition and the outcome of interest.
- We want to find the probability that a randomly chosen student is enrolled in Stat 1, given that they are a health sciences major.
- Knowing that a person texts while driving, we will find the probability that they got a speeding ticket in 2023.
- We have data on the type of car people bought and on what additional features they purchased. We will find the probability that a randomly selected sports car buyer also bought bucket seats.
- We have data on sex and paw preference for dogs. We will find the probability that a female dog is left-pawed.
- Using the contingency table, find the indicated probabilities.
- \(P(A\) and \(B2)\)
- \(P(B2)\)
- \(P(A)\)
- \(P(A|B2)\)
- \(P(B2|A)\)
- The following contingency table represents a sample of households asked about whether they had children and whether they had pets. Use the table to answer the following questions.
- Find the probability that a household has pets.
- Find the probability a household has pets and has children.
- Knowing that a household has children, find the probability they have pets.
- Are having children and having pets independent events? Explain.
- If \(A\) and \(B\) are disjoint events, what can we say about whether they are independent? Hint: think about how disjoint events will impact our tests for independence.
\(A\) | \(A^c\) | Total | |
---|---|---|---|
\(B1\) | 0.31 | 0.05 | 0.36 |
\(B2\) | 0.11 | 0.13 | 0.24 |
\(B3\) | 0.08 | 0.32 | 0.40 |
Total | 0.5 | 0.5 | 1.00 |
Children | No children | Total | |
---|---|---|---|
Pets | 38 | 47 | 85 |
No pets | 21 | 44 | 65 |
Total | 59 | 91 | 150 |