2 Probability

A primary goal of statistics is to describe the real world based on limited observations. These observations may be influenced by random factors, such as measurement error or environmental conditions. This chapter introduces probability, which is designed to describe random events. Later, we will see that the theory of probability is so powerful that we intentionally introduce randomness into experiments and studies so we can make precise statements from data.

2.1 Probability Basics

Definitions

An experiment is a process that produces an observation.
An outcome is a possible observation
The set of all possible outcomes is called the sample space
An event is a subset of the sample space.

Example Roll a die and observe the number of dots on the top face. This is an experiment, with six possible outcomes. The sample space is the set \(S = \{1,2,3,4,5,6\}\). The event “roll higher than 3” is the set \(\{4,5,6\}\).

Example Stop a random person on the street and ask them what month they were born. This experiment has the twelve months of the year as possible outcomes. An example of an event \(E\) might be that they were born in a summer month, \(E = \{June, July, August\}\)

Example Suppose a traffic light stays red for 90 seconds each cycle. While driving you arrive at this light, and observe the amount of time until the light turns green. The sample space is the interval of real numbers \([0,90]\). The event “you didn’t have to wait” is the set \(\{0\}\).

Definition 2.1 The probability of an event \(E\) is a number \(P(E)\) between 0 and 1 (inclusive), so \(0 \leq P(E) \leq 1\).

Events are a fundamental concept in probability. Therefore, it is important to know some basic set theory, which we summarize in the following definition.

Definition 2.2 Let \(A\) and \(B\) be events in a sample space \(S\).

\(A \cap B\) is the set of outcomes that are in both \(A\) and \(B\).
\(A \cup B\) is the set of outcomes that are in either \(A\) or \(B\) (or both).
\(\overline{A}\) is the set of outcomes that are not in \(A\) (but are in \(S\)).
\(A \setminus B\) is the set of outcomes that are in \(A\) and not in \(B\).

Probabilities obey some important rules:

Theorem 2.1 Let \(A\), \(B\) and \(C\) be events in the sample space \(S\).

\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
\(P(A) = 1 - P(\overline{A})\), where \(\overline{A} = S \setminus A\).
If \(A\) and \(B\) are disjoint, then \(P(A \cup B) = P(A) + P(B)\).
\(P(S) = 1\).

One way to assign probabilities to events is empirically, by repeating the experiment many times and observing the proportion of times the event occurs. While this can only approximate the true probability, it is sometimes the only approach possible. For example, in the United States the probability of being born in October is noticeably higher than the probability of being born in January, and these values can only be estimated by observing actual patterns of human births.

Another method is to make an assumption that all outcomes are equally likely, usually because of some physical property of the experiment. For example, because (high quality) dice are close to perfect cubes, one believes that all six sides of a die are equally likely to occur. Using the additivity of disjoint events (rule 3 in the theorem above),

\[ P(\{1\}) + P(\{2\}) + P(\{3\}) + P(\{4\}) + P(\{5\}) + P(\{6\}) = P(\{1,2,3,4,5,6\}) = 1 \]

Since all six probabilities are equal and sum to 1, the probability of each face occurring is \(1/6\). In this case, the probability of an event \(E\) can be computed by counting the number of elements in \(E\) and dividing by the number of elements in \(S\).

Example Suppose that two six-sided dice are rolled and the numbers appearing on the dice are observed. The sample space, \(S\), is given by

\[ \begin{pmatrix} (1,1), (1,2), (1,3), (1,4), (1,5), (1,6) \\ (2,1), (2,2), (2,3), (2,4), (2,5), (2,6) \\ (3,1), (3,2), (3,3), (3,4), (3,5), (3,6) \\ (4,1), (4,2), (4,3), (4,4), (4,5), (4,6) \\ (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) \\ (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) \end{pmatrix} \]

By the symmetry of the dice, we expect all 36 possible outcomes to be equally likely. So the probability of each outcome is \(1/36\).

The event “The sum of the dice is 6” is represented by

\[ E = \{(1,5), (2,4), (3,3), (4,2), (5,1)\} \]

The probability that the sum of two dice is 6 is given by \[ P(E) = \frac{|E|}{|S|} = \frac{5}{36}, \] which can be obtained by simply counting the number of elements in each set above.

2.2 Conditional Probability and Independence

Sometimes when considering multiple events, we have information about one of the events. If you roll two dice and one of them falls off of the table, while the other one shows a 6, that gives you a lot of information about whether the sum of the dice will be 4! Conditional probability formalizes this idea.

Definition 2.3 The conditional probability of \(A\) given \(B\) is \[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

Example What is the probability that both dice are 4, given that the sum of two dice is 8?

Solution: Let \(A\) be the event “both dice are 4” and \(B\) be the event “the sum is 8”. Then, \(P(A|B) = P(A \cap B)/P(B) = \frac{1/36}{5/36} = 1/5.\) Note that this is the hardest way to get an 8; the probability that one of the dice is 3 and the other is 5 is 2/5.

Definition 2.4 Two events are said to be independent if knowledge that one event occurs doesn’t give any probabilistic information as to whether the other event occurs. Formally, we say that \(A\) and \(B\) are independent if \(P(A|B) = P(A)\) or, equivalently, that \(P(B|A) = P(B)\).

Theorem 2.2 (The multiplication rule for independent events) If \(A\) and \(B\) are independent, then \[ P(A \cap B) = P(A)P(B) \]

The multiplication rule is often used as the definition of independence, because it works even in the case when \(P(A) = 0\) or \(P(B) = 0\). However, the intuition is clearer in the conditional probability definition.

Example Two dice are rolled. Let \(A\) be the event “The first die is a 5”, let \(B\) be the event “The sum of the dice is 7”, and let \(C\) be the event “The sum of the dice is 8.” Show that \(A\) and \(B\) are independent, but \(A\) and \(C\) are dependent.

Note that \(P(B) = 6/36 = 1/6\). Now, \(P(B|A) = P(A \cap B)/P(A) = \frac{1/36}{1/6} = 1/6\). Therefore, \(A\) and \(B\) are independent. However, \(P(C) = 5/36\), which is not the same as \(P(C|A) = \frac{1/36}{1/6} = 1/6\). Therefore, \(A\) and \(C\) are not independent. We note here that \(B\) and \(C\) are also not independent.

2.3 Counting Arguments

Using probability rules, one can often convert the computation of a probability to a problem of counting elements in a set. This leads to many problems which fall under the general title of enumerative combinatorics, a large and interesting field of mathematics. The interested reader should see ??? for some examples of this type of reasoning.

In this book, however, we will only use a few basic type of counting arguments.

Proposition 2.1 (Rule of product) If there are \(m\) ways to do something, and \(n\) ways to do another thing, then there are \(m \times n\) ways to do both things

Proposition 2.2 (Combinations) The number of ways of choosing \(k\) distinct objects from a set of \(n\) is given by \[ {n \choose k} = \frac{n!}{k!(n - k)!} \]

The R command for computing \({10 \choose 3}\) is choose(10,3).

Example

A coin is tossed 10 times. Some possible outcomes are HHHHHHHHHH, HTHTHTHTHT, and HHTHTTHTTT. Since each toss has two possibilities, the rule of product says that there are \(2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2 = 2^{10} = 1024\) possible outcomes for the experiment. We expect each possible outcome to be equally likely, so the probability of any single outcome is 1/1024.

Let \(E\) be the event “We flipped exactly three heads”. This might happen as the sequence HHHTTTTTTT, or TTTHTHTTHT, or many other ways. What is \(P(E)\)? To compute the probability, we need to count the number of possible ways that three heads may appear. Since the three heads may appear in any of the ten slots, the answer is

\[ |E| = {10 \choose 3} = \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = 120. \]

Then \(P(E) = 120/1024 \approx 0.117\).

2.4 Simulations

One of the advantages of using R is that we can often estimate probabilities using simulations. That’s not to downplay the importance of being able to make computations regarding probabilities, but that is a topic for another course.

For the purposes of simulation, one of the most important ways of creating a vector is with the sample() command. sample() is one of the first functions that we have come across, so take a second and type ?sample on the command line to see the help box.

That’s a lot of information! Some important things to digest are:

sample takes up to 4 arguments, though only the first argument x is required.
the parameter x is the vector of elements from which you are sampling.
size is the number of samples you wish to take.
replace determines whether you are sampling with replacement or not. Sampling without replacement means that sample will not pick the same value twice, and this is the default behavior. Pass replace = TRUE to sample if you wish to sample with replacement.
prob is a vector of probabilities or weights associated with x. It should be a vector of nonnegative numbers of the same length as x. If the sum of prob is not 1, it will be normalized.
To get a random number between 1 and 10 (inclusive), use

sample(x = 1:10,size = 1)

## [1] 2

To get 3 distinct numbers between 1 and 10, use

sample(x = 1:10,size = 3,replace = FALSE)

## [1]  6  4 10

Note: You don’t have to type replace = FALSE, because FALSE is the default value for whether sampling is done with replacement. You also don’t have to type x = ... or size = ... as long as these are the first and second arguments. However, it is sometimes clearer to explicitly name the arguments to complicated functions like sample. Use your best judgment, and include the parameter name if there is any doubt.

To simulate 8 rolls of a six-sided die, use

sample(x = 1:6,size = 8,replace = TRUE)

## [1] 2 5 1 2 3 1 5 6

Note that you can store these in variables:

RollDice <- sample(x = 1:6,size = 8,replace = TRUE)

and see what the sum of the dice would be:

sum(RollDice)

## [1] 24

OR, if you only care about the sum of the dice, you could even do

SumOfDice <- sum(sample(x = 1:6,size = 8,replace = TRUE))

Finally, if you want to sample from sets that aren’t just the first x integers, you can specify the vector from which to sample. The following takes a random sample of size 2 from the first 8 prime numbers, with repeats possible.

primes <- c(2,3,5,7,11,13,17,19)
sample(x = primes,size = 2, replace = TRUE)

## [1] 5 7

2.4.1 Using replicate to simulate experiments

The function replicate is an example of an implicit loop in R. The call

replicate(n, expr)

repeats the expression stored in expr n times, and stores the result as a vector.

Example

Estimate the probability that the sum of two dice is 8.

The plan is to use sample to simulate rolling two dice. We will say that success is a sum of 8. Then, we use the replicate function to repeat the experiment many times, and take the mean number of successes to estimate the probability. Since the final R command is complicated, we will build it up piece by piece.

First, simulate a roll of two dice:

Dice <- sample(x = 1:6, size = 2, replace = TRUE) # roll two dice
Dice # display our roll

## [1] 4 1

sum(Dice)

## [1] 5

sum(Dice) == 8 # test for success

## [1] FALSE

We now replicate the above code. You can use curly braces { and } to replicate more than one R command, separating each command with semicolons. When we do this, only the result of the last command is saved in the vector via replicate. First, we test that our replication works by only doing a few trials:

replicate(20, { Dice <- sample(x = 1:6, size = 2, replace = TRUE);  sum(Dice) == 8 })

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
## [12] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Finally, we want to compute the probability of success. R counts TRUE as 1 and FALSE as 0. If we take the average of a vector of TRUE/FALSE values, we get the number of TRUE divided by the size of the vector, which is exactly the proportion of times that success occurred. We replicate 10000 times for a more accurate estimate.

mean(replicate(10000, { Dice <- sample(x = 1:6, size = 2, replace = TRUE);  sum(Dice) == 8 }))

## [1] 0.1416

5/36

## [1] 0.1388889

We compare the simulated result to the true result, which is 5/36. Generally, the more samples you take using replicate(), the more accurate you can expect your simulation to be. It is often a good idea to repeat a simulation a couple of times to get an idea about how much variance there is in the results.

Example

A fair coin is repeatedly tossed. Estimate the probability that you observe Heads for the third time on the 10th toss.

As before, we build this up from scratch. Here is a sample of ten tosses of a coin:

coinToss <- sample(c("H", "T"), 10, replace = TRUE)
coinToss

##  [1] "H" "T" "H" "H" "H" "T" "H" "H" "T" "H"

In order for this to be a success, we need for there to be exactly 3 heads, so we count the number of heads:

sum(coinToss == "H")

## [1] 7

And we check to see whether it is equal to 3

sum(coinToss == "H") == 3

## [1] FALSE

Next, we also need to make sure that we didn’t have 3 heads in the first nine tosses. So, we look only at the first nine tosses:

coinToss[1:9]

## [1] "H" "T" "H" "H" "H" "T" "H" "H" "T"

and add up the heads observed in the first nine tosses:

sum(coinToss[1:9] == "H") == 2

## [1] FALSE

Note that both of those have to be true in order for this to be a success:

sum(coinToss == "H") == 3 && sum(coinToss[1:9] == "H") == 2 #OR

## [1] FALSE

sum(coinToss == "H") == 3 && coinToss[10] == "H" #OR

## [1] FALSE

sum(coinToss[1:9] == "H") == 2  && coinToss[10] == "H"

## [1] FALSE

We put this inside replicate

mean(replicate(10000, {coinToss <- sample(c("H", "T"), 10, replace = TRUE); 
                   (sum(coinToss == "H") == 3) && (sum(coinToss[1:9] == "H") == 2)}))

## [1] 0.0332

NOTE: I strongly recommend that you follow the workflow as presented above; namely,

Write code that performs the experiment a single time.
Add semi-colons to the end of each line and place inside mean(replicate(1000, { HERE }))

It is much easier to trouble-shoot your code this way, as you can test each line of your simulation separately.

Example (The Birthday Problem)

Estimate the probability that out of 25 randomly selected people, no two will have the same birthday. Assume that all birthdays are equally likely, except that none are leapday babies.

In order to do this, we need to be able to determine whether all of the elements in a vector are unique. R has many, many functions that can be used with vectors. For most things that you want to do, there will be an R function that does it. In this case it is anyDuplicated(), which returns the location of the first duplicate if there are any, and zero otherwise.

The important thing to learn here isn’t necessarily this particular function, but rather the fact that most tasks are possible via some built in functionality.

sample(x = 1:365, size = 25, replace = TRUE)

##  [1]   8  96 261 331  60  59 337  95 348  10 281 130   4 272 331 148  22
## [18] 195 346 276 192 106 363  81 173

anyDuplicated(sample(x = 1:365, size = 25, replace = T))

## [1] 0

trials <- replicate(n = 10000,expr = anyDuplicated(sample(x = 1:365, size = 25, replace = T)))
head(trials)

## [1] 25 12  0 22 10  0

mean(trials == 0)

## [1] 0.4272

Here, we have built up the simulation from the ground floor, first simulating the 25 birthdays, then determining if any two birthdays are the same, and then replicating the experiment. Note the use of mean in the last line to compute the proportion of successes. “Success” here is the event “no two people have the same birthday”, and the probability of this event is approximately 0.43. Interestingly, we see that it is actually quite likely (about 57%) that a group of 25 people will contain two with the same birthday.

Challenge

Modify the above code to take into account leap years. Is it reasonable to believe that birthdays throughout the year are equally likely? Later in this book, we will use birthday data to get a better approximation of the probability that out of 25 randomly selected people, no two will have the same birthday.

Example Three numbers are picked uniformly at random from the interval \((0,1)\). What is the probability that a triangle can formed whose side-lengths are the three numbers that you chose?

Solution: We need to check whether the sum of the two smaller numbers is larger than the largest number. We use the sort command to sort the three numbers into increasing order, as follows:

mean(replicate(10000, {x = sort(runif(3,0,1)); sum(x[1:2]) > x[3];}))

## [1] 0.5026

2.4.2 Simulating Conditional Probability

Simulating conditional probabilities can be a bit more challenging. In order to estimate \(P(A|B)\), we will estimate \(P(A \cap B)\) and \(P(B)\), and then divide the two answers. This is not the most efficient or best way to estimate \(P(A|B)\), but it is easy to do with the tools that we already have developed.

Example

Two dice are rolled. Estimate the conditional probability that the sum of the dice is at least 10, given that at least one of the dice is a 6.

First, we estimate the probability that the sum of the dice is at least 10 and at least one of the dice is a 6.

probAB <- mean(replicate(10000,
                         { dieRoll <- sample(1:6, 2, replace = TRUE);
                           sum(dieRoll) >= 10 && 6 %in% dieRoll
                         }))
probAB

## [1] 0.1367

Next, we estimate the probability that at least one of the dice is a 6.

probB <- mean(replicate(10000,6 %in% sample(1:6, 2, replace = TRUE)))
probB

## [1] 0.3027

Finally, we take the quotient.

probAB/probB

## [1] 0.4516022

Note that the correct answer is \(P(A\cap B)/P(B) = \frac{5/36}{11/36} = 5/11 \approx 0.4545\).

2.5 Exercises

Roll two dice, one white and one red. Consider these events:
- A: The sum is 7
- B: The white die is odd
- C: The red die has a larger number showing than the white
- D: The dice match (doubles)
Which pairs of events are disjoint? Which pairs are independent?
When rolling two dice, what is the probability that one die is twice the other?
Consider an experiment where you roll two dice, and subtract the smaller value from the larger value (getting 0 in case of a tie).
1. What is the probability of getting 0?
2. What is the probability of getting 4?
Estimate the probability that exactly 3 Heads are obtained when 7 coins are tossed.
Estimate the probability that a 10 is obtained when two dice are rolled.
Estimate the probability that the sum of five dice is between 15 and 20, inclusive.
Rolling two dice
1. Simulate rolling two dice and adding their values. Do 100000 simulations and make a bar chart showing how many of each outcome occurred.
2. You can buy trick dice, which look (sort of) like normal dice. One die has numbers 5, 5, 5, 5, 5, 5. The other has numbers 2, 2, 2, 6, 6, 6.
  
  Simulate rolling the two trick dice and adding their values. Do 100000 simulations and make a bar chart showing how many of each outcome occurred.
3. Sicherman dice also look like normal dice, but have unusual numbers.
  
  One die has numbers 1, 2, 2, 3, 3, 4. The other has numbers 1, 3, 4, 5, 6, 8.
  
  Simulate rolling the two Sicherman dice and adding their values. Do 100000 simulations and make a bar chart showing how many of each outcome occurred.
  
  How does your answer compare to part a?
In a room of 200 people (including you), estimate the probability that at least one other person will be born on the same day as you.
In a room of 100 people, estimate the probability that at least two people were not only born on the same day, but also during the same hour of the same day. (For example, both were born between 2 and 3.)
Suppose a die is tossed repeatedly, and the cumulative sum of all tosses seen is maintained. Estimate the probability that the cumulative sum ever is exactly 20. (Hint: the function cumsum computes the cumulative sums of a vector.)
If 100 balls are randomly placed into 20 urns, estimate the probability that at least one of the urns is empty.
Suppose a die is tossed three times. Let \(A\) be the event “The first toss is a 5”. Let \(B\) be the event “The first toss is the largest number rolled”. Determine, via simulation or otherwise, whether \(A\) and \(B\) are independent.
A standard deck of cards has 52 cards, four each of 2,3,4,5,6,7,8,9,10,J,Q,K,A. In blackjack, a player gets two cards and adds their values. Cards count as their usual numbers, except Aces are 11 (or 1), while K, Q, J are all 10.
1. How many two card hands are possible?
2. “Blackjack” means getting an Ace and a value ten card. What is probability of getting a blackjack?
3. What is probability of getting 19? (The probability that the sum of your cards is 19, using Ace as 11)
Use R to simulate dealing two cards, and compute these probabilities experimentally.
Assuming that there are no leapday babies and that all birthdays are equally likely, estimate the probability that no three people have the same birthday in a group of 50 people. (Hint: myVec[duplicated(myVec)] lists the values of myVec that have been seen before.)
Ultimate frisbee players are so poor they don’t own coins. So, team captains decide which team will play offense first by flipping frisbees before the start of the game. Rather than flip one frisbee and call a side, each team captain flips a frisbee and one captain calls whether the two frisbees will land on the same side, or on different sides. Presumably, they do this instead of just flipping one frisbee because a frisbee is not obviously a fair coin - the probability of one side seems likely to be different from the probability of the other side.
1. Suppose you flip two fair coins. What is the probability they show different sides?
2. Suppose two captains flip frisbees. Assume the probability that a frisbee lands convex side up is \(p\). Compute the probability (in terms of \(p\)) that the two frisbees match.
3. Make a graph of the probability of a match in terms of \(p\).
4. One reddit user flipped a frisbee 800 times and found that in practice, the convex side lands up 45% of the time. When captains flip, what is the probability of “same”? What is the probability of “different”?
5. What advice would you give to an ultimate frisbee team captain?
6. Is the two frisbee flip better than a single frisbee flip for deciding the offense?
Suppose you have two coins (or Frisbees) that land with Heads facing up with probability \(p\), where \(0 < p < 1\). One coin is red and the other is white. You toss both coins. Find the probability that the red coin is Heads, given that the red coin and the white coin are different.
Here is the distribution of milk chocolate M&M’s by color: \[ \begin{array}{cccccc} Yellow & Red & Orange & Brown & Green & Blue \\ 0.14 & 0.13 & 0.20 & 0.13 & 0.16 & 0.24 \end{array} \]
1. What is the probability that a randomly selected M&M is not green?
2. What is the probability that a randomly selected M&M is red, orange, or yellow?
3. Estimate the probability that a random selection of four M&M’s will contain a blue one.
4. Estimate the probability that a random selection of six M&M’s will contain all six colors.
Bob Ross was a painter with a PBS television show “The Joy of Painting” that ran for 11 years.
1. 91% of Bob’s paintings contain a tree. 85% contain two or more trees.
  
  What is the probability that he painted a second tree, given that he painted a tree?
2. 18% of Bob’s paintings contain a cabin. Given that he painted a cabin, there is a 35% chance the cabin is on a lake.
  
  What is the probability that a Bob Ross painting contains both a cabin and a lake?
(Source: https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/)
Suppose that you have 10 boxes, numbered 0-9. Box \(i\) contains \(i\) red marbles and \(9 - i\) blue marbles. You perform the following experiment. You pick a box at random, draw a marble and record its color. You then replace the marble back in the box, and draw another marble from the same box and record its color. You replace the marble back in the box, and draw another marble from the same box and record its color. So, all three marbles are drawn from the same box.
1. If you draw three consecutive red marbles, what is the probability that the 4th marble will also be red?
2. If you draw three consecutive red marbles, what is the probability that you chose the 9th box?