1 Randomness and Probability

Probability comes up in a wide variety of situations. Consider just a few examples.

The probability you win the next Powerball lottery if you purchase a single ticket, 4-8-15-16-42, plus the Powerball number, 23.
The probability that a “randomly selected” Cal Poly student is a California resident.
The probability that the high temperature in San Luis Obispo next Friday is above 90 degrees F.
The probability that the San Francisco 49ers win the next Superbowl.
The probability that extraterrestrial life currently exists somewhere in the universe.
The probability that you ate an apple on April 17, 2009.

Example 1.1 How are the situations above similar, and how are they different? What is one feature that all of the situations have in common? Is the interpretation of “probability” the same in all situations? The goal here is to just think about these questions, and not to compute any probabilities (or to even think about how you would).

A phenomenon is random if there are multiple potential outcomes, and there is uncertainty about which outcome will occur.
Uncertainty is understood in broad terms, and in particular does not only concern future occurrences.
Many phenomena involve physical randomness, like flipping a coin or drawing powerballs at random from a bin, or in statistical applications of random sampling or random assignment.
But in many other situations, randomness just vaguely reflects uncertainty.
Random does not mean haphazard. In a random phenomenon, while individual outcomes are uncertain, there is a regular distribution of outcomes over a large number of (hypothetical) repetitions.
Also, random does not necessarily mean equally likely. In a random phenomenon, certain outcomes or events might be more or less likely than others.
The probability of an event associated with a random phenomenon is a number in the interval $[0, 1]$ measuring the event’s likelihood or degree of uncertainty. A probability can take any value in the continuous scale from 0% to 100%.
There are two main interpretations of probability.
- Long run relative frequency. The probability of an event can be interpreted as the proportion of times that the event would occur in a very large number of hypothetical repetitions of the random phenomenon.
- Subjective probability. There are many situations where the outcome is uncertain, but it does not make sense to consider the situation as repeatable. In such situations, a subjective (a.k.a., personal) probability describes the degree of likelihood a given individual ascribes to a certain event. Think of subjective probabilities as measuring relative degrees of likelihood rather than long run relative frequencies.
Fortunately, the mathematics of probability work the same way regardless of the interpretation. In either case, the same basic logical consistency requirements must be satisfied.
A simulation involves an artificial recreation of the random phenomenon, usually using a computer. The probability of an event can be approximated by simulating the random phenomenon a large number of times and determining the proportion of simulated repetitions on which the event occurred out of the total number of repetitions in the simulation.

Example 1.2 One of the oldest documented problems in probability is the following: If three fair six-sided dice are rolled, what is more likely: a sum of 9 or a sum of 10?

Explain how you could conduct a simulation to investigate this question.
Use the simulation results to approximate the probability that the sum is 9; repeat for a sum of 10.
It can be shown that the theoretical probability that the sum is 9 is 25/216 = 0.116. Write a clearly worded sentence interpreting this probability as a long run relative frequency.
It can be shown that the theoretical probability that the sum is 10 is 27/216 = 0.125. How many times more likely is a sum of 10 than a sum of 9?

# single repetition

# 3 rolls
rolls = sample(1:6, size = 3, replace = TRUE)
rolls

[1] 6 3 2

# find the sum
sum(rolls)

[1] 11

# number of repetitions; each rep is set of three rolls
N_reps = 10000

# vector to store results
sum_of_rolls = vector(length = N_reps) 

# for loop for the simulation
for (i in 1:N_reps) {
  
  # roll the dice
  rolls = sample(1:6, size = 3, replace = TRUE)
  
  # compute the sum of the rolls, and store the value
  sum_of_rolls[i] = sum(rolls) 
}

# display the results of the first few repetitions
head(sum_of_rolls)

[1] 16  9 12 11 10  9

# count repetitions where sum is 9
sum(sum_of_rolls == 9)

[1] 1133

# proportion of repetitions where sum is 10
sum(sum_of_rolls == 10) / N_reps

[1] 0.129

# tabulate the results
table(sum_of_rolls)

sum_of_rolls
   3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18 
  48  141  285  478  674  896 1133 1290 1220 1191 1009  757  442  259  136   41

# create a "spike" plot (type = "h")
plot(table(sum_of_rolls),
     type = "h",
     xlab = "Sum of three rolls of a fair six-sided die",
     ylab = "Number of sets of three rolls")

Simulated distribution of sum of three rolls of a fair six-sided die

# Find the plot the relative frequency of a sum of 10
running_frequency_of_10 = cumsum(sum_of_rolls == 10)

running_relative_frequency_of_10 = running_frequency_of_10 / (1:N_reps)

plot(1:N_reps,
     running_relative_frequency_of_10,
     type = "l",
     xlab = "Repetition number",
     ylab = "Long run relative frequency of 10")

A probability can be interpreted as a theoretical long run relative frequency.
A probability can be approximated by a relative frequency from a large number of simulated repetitions, but there is some simulation margin of error, due to natural variability in the simulation.
The margin of error when approximating a single probability based on a simulated relative frequency is roughly on the order $1 / \sqrt{N}$ , where $N$ is the number of independently simulated values used to calculate the relative frequency. For example, if $N = 10000$ then the margin of error is roughly $1 / \sqrt{10000} = 0.01$ . (But be careful when approximating conditional probabilities.)

Example 1.3 As of Jun 21, 2023, FiveThirtyEight listed the following probabilities for who would win the 2023 World Series.

Team	Probability
Atlanta Braves	19%
Tampa Bay Rays	16%
Los Angeles Dodgers	10%
Houston Astros	7%
New York Yankees	7%
Other

According to FiveThirtyEight (as of Jun 21, 2023):

Are these probabilities most naturally interpreted as long run relative frequencies or subjective probabilities? Explain.
What must be the probability that the Braves do not win the 2023 World Series? How many times more likely is it for the Braves to not win than to win?
What must be the probability that either the Braves or the Rays win?
What must be the probability that one of the above five teams is the World Series champion?
What must be the probability that a team other than the above five teams is the World Series champion? That is, what value goes in the “Other” row in the table?
Donny Don’t says, “These are subjective probabilities, so I can’t use them to perform a simulation.” Explain to Donny how you could conduct a simulation that reflects these probabilities, say using a spinner (like from a kids game).
What would you expect the results of 10000 repetitions of a simulation of the World Series champion to look like? Construct a table summarizing what you expect. Is this necessarily what would happen?

We will use the two interpretations — long run relative frequencies and subjective probabilities — interchangeably.
With subjective probabilities it is often helpful to consider what might happen in a simulation.
It is also useful to consider long run relative frequencies in terms of relative degrees of likelihood.
Fortunately, the mathematics of probability work the same way regardless of the interpretation.
A probability takes a value in the sliding scale from 0 to 100%.
Don’t just focus on computation; always remember to properly interpret probabilities.

Example 1.4 In each of the following parts, which of the two probabilities, a or b, is larger, or are they equal? You should answer conceptually without attempting any calculations. Explain your reasoning.

Consider a Cal Poly student who frequently has blurry, bloodshot eyes, generally exhibits slow reaction time, always seems to have the munchies, and disappears at 4:20 each day. Which of the following events, $A$ or $B$ , has a higher probability? (Assume the two probabilities are not equal.)
1. The student has a GPA above 3.0.
2. The student has a GPA above 3.0 and smokes marijuana regularly.
Randomly select a man.
1. The probability that a randomly selected man is greater than six feet tall.
2. The probability that a randomly selected man who plays in the NBA is greater than six feet tall.
Randomly select a man.
1. The probability that a randomly selected man who is greater than six feet tall plays in the NBA.
2. The probability that a randomly selected man who plays in the NBA is greater than six feet tall.
Flip a coin which is known to be fair 10 times.
1. The probability that the results are, in order, HHHHHHHHHH.
2. The probability that the results are, in order, HHTHTTTHHT.
Flip a coin which is known to be fair 10 times.
1. The probability that all 10 flips land on H.
2. The probability that exactly 5 flips land on H.
In the Powerball lottery there are roughly 300 million possible winning number combinations, all equally likely.
1. The probability you win the next Powerball lottery if you purchase a single ticket, 4-8-15-16-42, plus the Powerball number, 23
2. The probability you win the next Powerball lottery if you purchase a single ticket, 1-2-3-4-5, plus the Powerball number, 6.
Continuing with the Powerball
1. The probability that the numbers in the winning number are not in sequence (e.g., 4-8-15-16-42-23)
2. The probability that the numbers in the winning number are in sequence (e.g., 1-2-3-4-5-6)
Continuing with the Powerball
1. The probability that you win the next Powerball lottery if you purchase a single ticket.
2. The probability that someone wins the next Powerball lottery. (FYI: especially when the jackpot is large, there are hundreds of millions of tickets sold.)

Warning! Your psychological judgment of probabilities is often inconsistent with the mathematical logic of probabilities.
When interpreting probabilities, consider the conditions under which the probabilities were computed, in the proper direction
When interpreting probabilities, be careful not to confuse “the particular” with “the general”.
- “The particular:” A very specific event, surprising or not, often has low probability.
- “The general:” While a very specific event often has low probability, if there are many like events their combined probability can be high.
Even if an event has extremely small probability, given enough repetitions of the random phenomenon, the probability that the event occurs on at least one of the repetitions is often high.