2 Working with Probabilities
- Due to the wide variety of types of random phenomena, an outcome can be virtually anything. In particular, an outcome does not have to be a number.
- The sample space is the set of all possible outcomes of the random phenomenon..
- An event is something that could happen. That is, an event is a collection of outcomes that satisfy some criteria. If the sample space outcomes are represented by rows in a spreadsheet, then an event is a subset of rows that satisfies some criteria
- Events are typically denoted with capital letters near the start of the alphabet, with or without subscripts (e.g.
, , , , ). Events can be composed from others using basic set operations like unions ( ), intersections ( ), and complements ( ).- Read
as “not ”. - Read
as “ and ” - Read
as “ or ”. Note that unions ( , “or”) are always inclusive. occurs if occurs but does not, occurs but does not, or both and occur.
- Read
- A collection of events
are disjoint (a.k.a. mutually exclusive) if for all . That is, multiple events are disjoint if none of the events have any outcomes in common. - A probability measure, typically denoted
, assigns probabilities to events to quantify their relative likelihoods according to the assumptions of the model of the random phenomenon. - The probability of event
, computed according to probability measure , is denoted . - A valid probability measure
must satisfy the following three logical consistency “axioms”.- For any event
, . - If
represents the sample space then . - (Countable additivity.) If
are disjoint then
- For any event
- Additional properties of a probability measure follow from the axioms
- Complement rule. For any event
, . - Subset rule. If
then . - Addition rule for two events. If
and are any two events - Law of total probability. If
are disjoint events with , then
- Complement rule. For any event
- The axioms of a probability measure are minimal logical consistent requirements that ensure that probabilities of different events fit together in a valid, coherent way.
- A single probability measure corresponds to a particular set of assumptions about the random phenomenon.
Example 2.1 The probability that a randomly selected U.S. household has a pet dog is 0.47. The probability that a randomly selected U.S. household has a pet cat is 0.25. (These values are based on the 2018 General Social Survey (GSS).)
Represent the information provided using proper symbols.
Donny Don’t says: “the probability that a randomly selected U.S. household has a pet dog OR a pet cat is
.” Do you agree? What must be true for Donny to be correct? Explain. (Hint: for the remaining parts it helps to consider two-way tables.)
What is the smallest possible value of the probability that a randomly selected U.S. household has a pet dog AND a pet cat? Describe the (unrealistic) situation in which this extreme case would occur.
What is the largest possible value of the probability that a randomly selected U.S. household has a pet dog AND a pet cat? Describe the (unrealistic) situation in which this extreme case would occur. What would be the probability that a randomly selected U.S. household has a pet dog OR a pet cat in this scenario?
Donny Don’t says: “I remember hearing once that in probability OR means add and AND means multiply. So the probability that a randomly selected U.S. household has a pet dog AND a pet cat is
.” Do you agree? Explain.
According to the GSS, the probability that a randomly selected U.S. household has a pet dog AND a pet cat is
. Compute the probability that a randomly selected U.S. household has a pet dog OR a pet cat.
Compute and interpret
.
2.1 Equally Likely Outcomes and Uniform Probability Measures
- For a sample space
with finitely many possible outcomes, assuming equally likely outcomes corresponds to a probabiliy measure which satisfies
Example 2.2 Roll a fair four-sided die twice, and record the result of each roll in sequence.
- How many possible outcomes are there? Are they equally likely?
- Compute
, where is the event that the sum of the two dice is 4.
- Compute
, where is the event that the sum of the two dice is at most 3.
- Compute
, where the event that the larger of the two rolls (or the common roll if a tie) is 3.
- Compute and interpret
.
- The continuous analog of equally likely outcomes is a uniform probability measure. When the sample space is uncountable, size is measured continuously (length, area, volume) rather that discretely (counting).
Example 2.3 Regina and Cady are meeting for lunch. Suppose they each arrive uniformly at random at a time between noon and 1:00, independently of each other. Record their arrival times as minutes after noon, so noon corresponds to 0 and 1:00 to 60.
- Draw a picture representing the sample space.
- Compute the probability that the first person to arrive has to wait at most 15 minutes for the other person to arrive. In other words, compute the probability that they arrive within 15 minutes of each other.
- Compute the probability that the first person to arrive arrives before 12:15.
= 1000
N_rep
# Simulate values uniformly between 0 and 60, independently
= runif(N_rep, 0, 60)
u1 = runif(N_rep, 0, 60)
u2
# waiting time
= abs(u1 - u2)
waiting_time
# first time
= pmin(u1, u2)
first_arrival_time
# put the variables together in a data frame
= data.frame(u1, u2, waiting_time, first_arrival_time)
meeting_sim
# first few rows (with kable formatting)
head(meeting_sim) |>
kbl(digits = 3) |>
kable_styling()
u1 | u2 | waiting_time | first_arrival_time |
---|---|---|---|
59.440 | 8.202 | 51.239 | 8.202 |
13.179 | 37.741 | 24.562 | 13.179 |
11.262 | 14.005 | 2.743 | 11.262 |
2.024 | 34.330 | 32.307 | 2.024 |
55.781 | 36.682 | 19.098 | 36.682 |
48.931 | 57.138 | 8.207 | 48.931 |
# Approximate probability that waiting time is less than 15
sum(waiting_time < 15) / N_rep
[1] 0.435
# Approximate probability that first arrival time is less than 15
sum(first_arrival_time < 15) / N_rep
[1] 0.438
# "Base R" plots
plot(u1, u2,
col = ifelse(waiting_time < 15, "orange", "black"),
xlab = "Regina's arrival time",
ylab = "Cady's arrival time")
abline(a = 15, b = 1)
abline(a = -15, b = 1)
plot(u1, u2,
col = ifelse(first_arrival_time < 15, "orange", "black"),
xlab = "Regina's arrival time",
ylab = "Cady's arrival time")
library(ggplot2)
# ggplots
ggplot(meeting_sim,
aes(x = u1, y = u2, col = (waiting_time < 15))) +
geom_point() +
geom_abline(slope = c(1, 1), intercept = c(15, -15)) +
labs(x = "Regina's arrival time",
y = "Cady's arrival time")
ggplot(meeting_sim,
aes(x = u1, y = u2, col = (first_arrival_time < 15))) +
geom_point() +
labs(x = "Regina's arrival time",
y = "Cady's arrival time")