Chapter 3 Probability Concepts

3.1 Module Overview

In previous modules, we discussed ways to describe variables and the relationships between them. From here, we want to start asking inferential statistics questions like “If my sample mean is 10, how likely is it that the population mean is actually 11?” Probability is going to start us on this path.

Probability theory is the science of uncertainty and it is really interesting! But it can also be quite challenging. I try to frame probability around things most of us can do at home: flipping a coin, rolling a die, drawing from a deck of cards. You certainly don’t need any of these things to get through this module, but you may find it helpful to have a coin/die/deck of cards on hand as you read through the examples.

Take your time running practice problems (the solutions to odd-numbered problems are in the back of your textbook - these make great practice problems!).

Module Learning Objectives/Outcomes

  1. Find and interpret probabilities for equally likely events.
  2. Find and interpret probabilities for events that are not equally likely.
  3. Find and interpret joint and marginal probabilities.
  4. Find and interpret conditional probabilities.
  5. Use the multiplication rule and independence to calculate probabilities.

This module’s outcomes correspond to course outcome (3) understand the basic rules of probability.

3.2 Experiments, Sample Spaces, and Events

Probability is the science of uncertainty. When we run an experiment, we are unsure of what the outcome will be. Because of this uncertainty, we say an experiment is a random process.

The probability of an event is the proportion of times is would occur if the experiment were run infinitely many times: \[ \text{probability of event} = \frac{\text{number of ways event can occur}}{\text{number of possible (unique) outcomes}} \]

An event is some specified possible outcome (or collection of outcomes) we are interested in observing.

Example: If you want to roll a 6 on a six-sided die, there are six possible outcomes \(\{1,2,3,4,5,6\}\). So the probability of rolling a 6 is \[ \frac{\text{number of ways to roll a 6}}{\text{number of possible rolls}} = \frac{1}{6} \]

Example: We can extend this to a collection of events, say the probability of rolling a 5 or a 6: \[ \frac{\text{number of ways to roll a 5 or 6}}{\text{number of possible rolls}} = \frac{2}{6} \]

The collection of all possible outcomes is called a sample space, denoted \(S\). For the six-sided die, \(S=\{1,2,3,4,5,6\}\).

To simplify our writing, we use probability notation:

  • Events are assigned capital letters.
  • \(P(A)\) denotes the probability of event \(A\).

3.3 Probability Distributions

Two outcomes are disjoint or mutually exclusive if they cannot both happen (at the same time). Think back to how we developed bins for histograms - the bins need to be nonoverlapping - this is the same idea!

Example: If I roll a six-sided die one time, rolling a 5 and rolling a 6 are disjoint. I can get a 5 or a 6, but not both on the same roll.

3.3.1 Venn Diagrams

Venn Diagrams show events as circles. The circles overlap where events share common outcomes.

When a Venn Diagram has no overlap the events are mutually exclusive. This Venn Diagram shows the event “Draw a Diamond” and the event “Draw a Face Card.” There are 13 diamonds and 12 face cards in a deck. In this case, the events are not mutually exclusive: it’s possible to draw both a diamond and a face card at the same time: the Jack of Diamonds, Queen of Diamonds, and Kind of Diamonds.

For quick reference, a full 52-card deck is shown below. The “face cards” are the J, Q, and K. Each row represents a “suit.” From top to bottom, the suits are clubs, spades, hearts, and diamonds. Cards can be either red (hearts and diamonds) or black (spades and clubs).

Click here for a graphic of a standard 52 card deck.

On your own: Consider events

  • \(A\): “Draw a spade”
  • \(B\): “Draw a queen”
  • \(C\): “Draw a red”

Which of these events are mutually exclusive?

A probability distribution lists all possible disjoint outcomes (think: all possible values of a variable) and their associated probabilities. This can be in the form of a table

Roll of a six-sided die 1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6

(note that we could visualize this with a bar plot!) or an equation, which we will discuss in a later module.

3.3.2 Probability Axioms

  1. All listed outcomes must be disjoint.
  2. Each probability must be between 0 and 1.
  3. The probabilities must sum to 1.

These are the requirements for a valid probability distribution. Note that #2 is true for ALL probabilities. If you ever calculate a probability and get a negative number or a number greater than 1, you know something went wrong!

Example: Use the probability axioms to check whether the following tables are probability distributions.

X {1 or 2} {3 or 4} {5 or 6}
P(X) 1/3 1/3 1/3

Each axiom is satisfied, so this is a valid probability distribution.

Y {1 or 2} {2 or 3} {3 or 4} {5 or 6}
P(Y) 1/3 1/3 1/3 -1/3

In this case, the outcomes are not disjoint and one of the probabilities is negative, so this is not a valid probability distribution.

3.3.3 Exercises

  1. Use the probability axioms to determine whether each of the following is a valid probability distribution:
    1. x 0 1 2 3
      P(x) 0.1 0.2 0.1 0.3
    2. x 0 or 1 1 or 2 3 or 4 5 or 6
      P(x) 0.1 0.2 0.4 0.3
  2. Determine whether the following events are mutually exclusive (disjoint).
    1. Your friend studies in the library. You study at home.
    2. You and your study group all earn As on an exam.
    3. You stay out until 3 am. You go to bed at 9 pm.
  3. In a group of people, 11 have cats and 13 have dogs. 4 people have both cats and dogs. Sketch a Venn Diagram for these events.

3.4 Rules of Probability

Consider a six-sided die. \[P(\text{roll a 1 or 2}) = \frac{\text{2 ways}}{\text{6 outcomes}} = \frac{1}{3}.\] Notice that we get the same result by taking \[P(\text{roll a 1})+P(\text{roll a 2}) = \frac{1}{6}+\frac{1}{6} = \frac{1}{3}.\] It turns out this is widely applicable!

3.4.1 Addition Rules

Addition Rule for Disjoint Outcomes

If \(A_1\) and \(A_2\) are disjoint outcomes, then the probability that one of them occurs is \[P(A_1 \text{ or } A_2) = P(A_1)+P(A_2).\] This can also be extended to more than two disjoint outcomes: \[P(A_1 \text{ or } A_2 \text{ or } \dots \text{ or } A_k) = P(A_1)+P(A_2)+\dots + P(A_k)\] for \(k\) disjoint outcomes.

Now consider a deck of cards. Let \(A\) be the event that a card drawn is a diamond and let \(B\) be the event it is a face card. (Check back to 3.2 for the Venn Diagram of these events - they are not disjoint!).

Here \(P(A)=\frac{13}{52}\) and \(P(B)=\frac{12}{52}\). If we add these, we double count the Jack of Diamonds, Queen of Diamonds, and King of Diamonds. So we need to account for that: \(\frac{13}{52}+\frac{12}{52}-\frac{3}{52}\).

General Addition Rule

For any two events \(A\) and \(B\), the probability that at least one will occur is \[P(A \text{ or } B) = P(A)+P(B)-P(A \text{ and }B).\]

Notice that when we say “or,” we include the situations where A is true, B is true, and the situation where are both A and B are true. This is an inclusive or. Basically, if I said “Do you like cats or dogs?” and you said “Yes.” because you like cats and dogs, that would be a perfectly valid response. I recommend using the inclusive or with your friends any time you want to get out of making a decision.

3.4.2 Complements

The complement of an event is all of the outcomes in the sample space that are not in the event.

Example: For a single roll of a six-sided die, the sample space is all possible rolls: 1, 2, 3, 4, 5, or 6. If the event \(A\) is rolling a 1 or a 2, then the complement of this event, denoted \(A^c\), is rolling a 3, 4, 5, or 6. We could also write this in probability notation: \(S = \{1, 2, 3, 4, 5, 6\}\) and if \(A=\{1,2\}\), then \(A^c=\{3, 4, 5, 6\}\).

Property: \[P(A \text{ or } A^c)=1\] Using the addition rule, \[P(A \text{ or } A^c) = P(A)+P(A^c) = 1.\] (Make sure you can convince yourself that \(A\) and \(A^c\) are always disjoint.) This is especially useful written as \[P(A) = 1-P(A^c).\]

Example: Consider rolling 2 six-sided dice and taking their sum. The event of interest is a sum less than 12. Find

  1. \(A^c\)
  2. \(P(A^c)\)
  3. \(P(A)\)

If \(A =\) (sum less than 12), then \(A^c =\) (sum greater than or equal to 12). Take a moment to notice that there is only one way to get a sum greater than or equal to 12: rolling two 6s. The chart below shows the rolls of Die 1 as columns and the rolls for Die 2 as rows. The numbers in the middle are the sums. Notice that there are 36 possible ways to roll 2 dice.

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

\[ P(A^c) = \frac{1}{36}\] Then \[P(A) = 1 - P(A^c) = 1-\frac{1}{36} = \frac{35}{36}\] which is a much faster way to calculate this than to count up all the times the sum is less than 12!

3.5 Conditional Probability

A contingency table is a way to summarize bivariate data, or data from two variables.

Smallpox in Boston (1726)




















  • 5136 is the count of people who lived AND were not inoculated. 
  • 6224 is the total number of observations.
  • 244 is the total number of people who were inoculated.
  • 5374 is the total number of people who lived.

This is like a two-way frequency distribution. Like a frequency distribution, we can convert to proportions by dividing each count by the total number of observations:




















  • 0.8252 is the proportion of people who lived AND were not inoculated. 
  • 1.000 is the proportion of total number of observations. Think of this as 100% of the observations.
  • 0.0392 is the proportion of people who were inoculated.
  • 0.8634 is the proportion of people who lived.

The row and column totals are marginal probabilities. The probability of two events together (\(A\) and \(B\)) is a joint probability.

What can we learn about the result of smallpox if we already know something about inoculation status? For example, given that a person is inoculated, what is the probability of death? To figure this out, we restrict our attention to the 244 inoculated cases. Of these, 6 died. So the probability is 6/244.

This is called conditional probability, the probability of some event \(A\) given that event \(B\) occurs: \[P(A|B) = \frac{P(A\text{ and }B)}{P(B)}\] where the symbol | is read as “given.”

For death given inoculation, \[P(\text{death}|\text{inoculation}) = \frac{P(\text{death and inoculation})}{P(\text{death})} = \frac{0.0010}{0.0392} = 0.0255.\] Notie that we could also write this as \[P(\text{death}|\text{inoculation}) = \frac{P(\text{death and inoculation})}{P(\text{death})} = \frac{6/6224}{244/6224} = \frac{6}{244},\] which is what we found when using the table to restrict our attention to only the inoculated cases.

If knowing whether event \(B\) occurs tells us nothing about event \(A\), the events are independent. For example, if we know that the first flip of a (fair) coin came up heads, that doesn’t tell us anything about what will happen next time we flip that coin.

We can test for independence by checking if \(P(A|B)=P(A)\).

3.5.1 Multiplication Rules

Multiplication Rule for Independent Processes

If \(A\) and \(B\) are independent events, then \[P(A \text{ and }B) = P(A)P(B).\]

We can extend this to more than two events: \[P(A \text{ and }B \text{ and } C \text{ and } \dots) = P(A)P(B)P(C)\dots.\]

Note that if \(P(A \text{ and }B) \ne P(A)P(B)\), then \(A\) and \(B\) are not independent.

General Multiplication Rule

If \(A\) and \(B\) are any two events, then \[P(A \text{ and }B) = P(A|B)P(B).\]

Notice that this is just the conditional probability formula, rewritten in terms of \(P(A \text{ and }B)\)!