# 1Probability: A Meassure of Uncertainty

## 1.1 Introduction

Definition 1.1 “A probability is simply a number between 0 and 1 that measures the uncertainty of a particular event.” (Albert and Hu, 2019, p. 17) (pdf)

## 1.2 Classical View of a Probability

Only applicable when the outcomes are equally likely.

$Prob(Outcome) = \frac{1}{\text{Number of outcomes}} \tag{1.1}$

Classical View: What is the probability of rolling doubles with two dice?

There are $$6 \times 6 = 36$$ equally likely ways of rolling two distinguishable dice and there are exactly six ways of rolling doubles. So using the classical viewpoint, the probability of doubles is $$\frac{6}{36} = \frac{1}{6} = 0.1666667$$.

## 1.3 Frequency View of a Probability

Applicable when it is possible to repeat the random experiment many times under the same conditions.

Frequency View: What is the probability of rolling doubles with two dice?

Imagine rolling two dice many times under similar conditions. Each time two dice are rolled, we observe whether the result are doubles or not. Then the probability of doubles is approximated by the relative frequency

$Prob(doubles) \approx \frac{\text{number of doubles}}{\text{number of experiments}}$

Listing 1.1: Rolling two dice: pb

set.seed(42)
n = 1e5

two_rolls <- function(){
sample(1:6, size = 2, replace = TRUE)
}

d <-
tibble::as_tibble(t(replicate(n, two_rolls())), .name_repair = "unique") |>
dplyr::rename(Die1 = ...1, Die2 = ...2) |>
dplyr::mutate(Match = ifelse(Die1 == Die2, T, F))
#> New names:
#> •  -> ...1
#> •  -> ...2

Listing 1.2: Show the simulation result of rolling two dice: pb

str(d)
#> tibble [100,000 × 3] (S3: tbl_df/tbl/data.frame)
#>  $Die1 : int [1:100000] 1 1 2 2 1 1 6 2 3 1 ... #>$ Die2 : int [1:100000] 5 1 4 2 4 5 4 2 1 3 ...

• BET 2 – If B occurs (you observe a red ball in the above experiment), you win $100. Otherwise, you win nothing. Assign bold to the bet that you prefer. 1. Let B represent choosing red from a box of 7 red and 3 white balls. Again compare BET 1 with BET 2 – which bet do you prefer? 2. Let B represent choosing red from a box of 3 red and 7 white balls. Again compare BET 1 with BET 2 – which bet do you prefer? 3. Based on your answers to (a), (b), (c), circle the interval of values that contain your subjective probability P(S) 30%-50% ### 1.10.9 Frequency of Vowels in Huckleberry Finn Suppose you choose a page at random from the book Huckleberry Finn by Mark Twain and find the first vowel on the page. 1. If you believe it is equally likely to find any one of the five possible vowels, fill in the probabilities of the vowels below. Vowel a e i o u Probability 0.2 0.2 0.2 0.2 0.2 1. Based on your knowledge about the relative use of the different vowels, assign probabilities to the vowels. Vowel a e i o u Probability 0.3 0.3 0.15 0.15 0.1 1. Do you think it is appropriate to apply the classical viewpoint to probability in this example? (Compare your answers to parts a and b.) No, because there is not the same probability for all events given the different empirical distribution of the vowels. 2. On each of the first fifty pages of Huckleberry Finn, your author found the first five vowels. Here is a table of frequencies of the five vowels: Vowel a e i o u sum Frequency 61 63 34 70 22 250 Probability 0.244 0.252 0.136 0.280 0.088 1.000 Use this data to find approximate probabilities for the vowels. f <- c(61, 63, 34, 70, 22) f / sum(f) sum(0.244, 0.252, 0.136, 0.280, 0.088) #> [1] 0.244 0.252 0.136 0.280 0.088 #> [1] 1 ### 1.10.10 Purchasing Boxes of Cereal Suppose a cereal box contains one of four different posters denoted A, B, C, and D. You purchase four boxes of cereal and you count the number of posters (among A, B, C, D) that you do not have. The possible number of “missing posters” is 0, 1, 2, and 3. 1. Assign probabilities if you believe the outcomes are equally likely. Number of missing posters 0 1 2 3 0.25 0.25 0.25 0.25 1. Assign probabilities if you believe that the outcomes 0 and 1 are most likely to happen. Number of missing posters 0 1 2 3 0.35 0.35 0.20 0.10 1. Suppose you purchase many groups of four cereals, and for each purchase, you record the number of missing posters. The number of missing posters for 20 purchases is displayed below. For example, in the first purchase, you had 1 missing poster, in the second purchase, you also had 1 missing poster, and so on. 1, 1, 1, 2, 1, 1, 0, 0, 2, 1, 2, 1, 3, 1, 2, 1, 0, 1, 1, 1 miss <- c(1, 1, 1, 2, 1, 1, 0, 0, 2, 1, 2, 1, 3, 1, 2, 1, 0, 1, 1, 1) p0 <- sum(miss == 0) / 20 p1 <- sum(miss == 1) / 20 p2 <- sum(miss == 2) / 20 p3 <- sum(miss == 3) / 20 print(c(p0,p1,p2,p3)) #> [1] 0.15 0.60 0.20 0.05 Using these data, assign probabilities. Number of missing posters 0 1 2 3 0.15 0.60 0.20 0.05 1. Based on your work in part c, is it reasonable to assume that the four outcomes are equally likely? Why? Yes, because there is only 25% of a 4 package purchase where more than 1 poster is missing. Also the figures only say how much posters are missing but do not say which kind of poster is missing. Therefore we can assume that – if there are missing poster – that these are always different kind of posters. ### 1.10.11 Writing Sample Spaces 1 For the following random experiments, give an appropriate sample space for the random experiment. You can use any method (a list, a tree diagram, a two-way table) to represent the possible outcomes. 1. You simultaneously toss a coin and roll a die. {1,T}, {2,T}, {3,T}, {4,T}, {5,T}, {6,T} {1,H}, {2,H}, {3,H}, {4,H}, {5,H}, {6,H} 1. Construct a word from the five letters a, a, e, e, s. (df <- as.data.frame( gtools::permutations(5, 5, c('a', 'a', 'e', 'e', 's'), set = FALSE, repeats.allowed = FALSE) ) |> dplyr::distinct()) #> V1 V2 V3 V4 V5 #> 1 a a e e s #> 2 a a e s e #> 3 a a s e e #> 4 a e a e s #> 5 a e a s e #> 6 a e e a s #> 7 a e e s a #> 8 a e s a e #> 9 a e s e a #> 10 a s a e e #> 11 a s e a e #> 12 a s e e a #> 13 e a a e s #> 14 e a a s e #> 15 e a e a s #> 16 e a e s a #> 17 e a s a e #> 18 e a s e a #> 19 e e a a s #> 20 e e a s a #> 21 e e s a a #> 22 e s a a e #> 23 e s a e a #> 24 e s e a a #> 25 s a a e e #> 26 s a e a e #> 27 s a e e a #> 28 s e a a e #> 29 s e a e a #> 30 s e e a a 1. Suppose a person lives at point 0 and each second she randomly takes a step to the right or a step to the left. You observe the person’s location after four steps. t(gtools::permutations(2, 4, c('L', 'R'), set = FALSE, repeats.allowed = TRUE)) #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] #> [1,] "L" "L" "L" "L" "L" "L" "L" "L" "R" "R" "R" "R" "R" "R" #> [2,] "L" "L" "L" "L" "R" "R" "R" "R" "L" "L" "L" "L" "R" "R" #> [3,] "L" "L" "R" "R" "L" "L" "R" "R" "L" "L" "R" "R" "L" "L" #> [4,] "L" "R" "L" "R" "L" "R" "L" "R" "L" "R" "L" "R" "L" "R" #> [,15] [,16] #> [1,] "R" "R" #> [2,] "R" "R" #> [3,] "R" "R" #> [4,] "L" "R" 1. In the first round of next year’s baseball playoff, the two teams, say the Phillies and the Diamondbacks play in a best-of-five series where the first team to win three games wins the playoff. 2. A couple decides to have children until a boy is born. {G, B} 3. A roulette game is played with a wheel with 38 slots numbered 0, 00, 1, …, 36. Suppose you place a$10 bet that an even number (not 0) will come up in the wheel. The wheel is spun.

4. Suppose three batters, Marlon, Jimmy, and Bobby, come to bat during one inning of a baseball game. Each batter can either get a hit, walk, or get out.