15.5 Exercises

i2ds: Exercises

The following exercises further explore the interplay between simulations and analytic solutions by examining three types of problems:

  • other Bayesian situations
  • other versions of the Monty Hall problem
  • related probability puzzles

15.5.1 More Bayesian situations

The mammography problem shown in Section 15.2 is a special case of a more general class of Bayesian situations, which require the computation of an inverse conditional probability \(p(C|T)\) when given a prior probability \(p(C)\) and two conditional probabilities \(p(T|C)\) and \(p(T|\neg C)\).

In 2021, an intriguing version of this problem is the following: The list of rapid SARS-CoV-2 self tests of the BfArM contains over 450 tests. Each test is characterized by its sensitivity \(p({T+}|infected)\) and specificity \(p({T-}|not\ infected)\). The current local prevalence of SARS-CoV-2 infections (as of 2021-05-20) is 100 out of 100,000 people.

  1. Download the table of all tests and compute their mean sensitivity and specificity values.

  2. Compute two conditional probabilities:

    • What is the average probability of being infected upon a positive test result (i.e., \(PPV = p(infected|{T+})\)?
    • What is the average probability of not being infected upon a negative test result (i.e., \(NPV = p(not\ infected|{T-})\)?
  3. How would the values of 2. change, if the local prevalence of SARS-CoV-2 infections rose to 500 out of 100,000 people?

  4. Conduct a simulation by enumeration to compute both values for a population of 10,000 people.

  5. Bonus: Try visualizing your results (in any way, i.e., not necessarily by using the riskyr package).


The following table summarizes the mean sensitivity and specificity values of all tests:

Table 15.1: Summary of 455 tests.
n sens_mean sens_min sens_max sens_sd spec_mean spec_min spec_max spec_sd
455 95.2191 80.2 100 3.111103 99.28365 96.7 100 0.6847694

The results of 2. and 3. could be computed by the Bayesian formula (see in Section 15.2). The following visualizations result from creating riskyr scenarios.

  • Solutions for a prevalence of 100 out of 100,000 people:

  • Solutions for a prevalence of 500 out of 100,000 people:

15.5.2 Solving Bayesian situations by sampling

In Section 15.2, we solved so-called Bayesian situations (and the mammography problem) by enumerating cases. However, we can also use a sampling approach to address and solve such problems.

  1. Use a sampling approach for solving the mammography problem (discussed in Section 15.2).

  2. Adopt either approach (i.e., either enumeration and sampling) for solving the cab problem (originally introduced by Kahneman & Tversky, 1972a; and discussed by, e.g., Bar-Hillel, 1980):

A cab was involved in a hit-and-run accident at night.
Two cab companies, the Green and the Blue, operate in the city.
You are given the following data:
1. 85% of the cabs in the city are Green and 15% are Blue.
2. A witness identified the cab as a Blue cab.
The court tested his ability to identify cabs under the appropriate visibility conditions. When presented with a sample of cabs (half of which were Blue and half of which were Green) the witness correctly identified each color in 80% of the cases and erred in 20% of the cases.

What is the probability that the cab involved in the accident was Blue rather than Green?

Hint: Despite their superficial differences, the structure of the cab problem is almost identical to the mammography problem (e.g., compare Figures 4 and 9 of Neth, Gradwohl, et al., 2021).

  1. Discuss the ways in which the enumeration and sampling approaches differ from each other and from deriving an analytic solution. What are the demands of each method and the key trade-offs between them?

15.5.3 Monty Hall reloaded

In Section 15.3.1, we introduced the notorious brain teaser known as the Monty Hall dilemma (Savant, 1990; MHD, popularized by Selvin, 1975) and showed how it can be solved by a simulation that randomly samples environmental states (i.e., the setup of games) and actions (e.g., the host’s and player’s choices).

An erroneous MHD simulation

A feature of simulations is that they are only as good as the thinking that created them. And just like statements or scientific theories, they are not immune to criticism, and can be wrong and misleading. In the following, we illustrate this potential pitfall of simulations by creating an enumeration that yields an erroneous result.

So stay alert — and try to spot any errors we make! Importantly, the error is not in the method or particular type of simulation, but in our way of thinking about the problem.

  1. Setup

Without considering any specific constraints, the original problem description of the MHD describes a 3x3x3 grid of cases:

  • 3 car locations
  • 3 initial player picks
  • 3 doors that Monty could open

The following tibble grid_MH creates all 27 cases that are theoretically possible:

# Setup of options:
car    <- 1:3
player <- 1:3
open   <- 1:3

# Table/space of all 27 options:
grid_MH <- expand_grid(car, player, open)
# names(grid_MH) <- c("car", "player", "open")

grid_MH <- tibble::as_tibble(grid_MH) %>% 
  arrange(car, player)

#> # A tibble: 27 × 3
#>     car player  open
#>   <int>  <int> <int>
#> 1     1      1     1
#> 2     1      1     2
#> 3     1      1     3
#> 4     1      2     1
#> 5     1      2     2
#> 6     1      2     3
#> # … with 21 more rows
  1. Adding constraints

However, not all these cases are possible, as Monty cannot open just any of the three doors. More specifically, the problem description imposes two key constraints on Monty’s action:

  • Condition 1: Monty cannot open the door actually containing the car.

  • Condition 2: Monty cannot open the door initially chosen by the player.

This allows us to eliminate some cases, or describe which of the 27~theoretical cases are actually possible in this problem:

grid_MH$possible <- TRUE  # initialize

grid_MH$possible[grid_MH$open == grid_MH$car]    <- FALSE  # condition 1
grid_MH$possible[grid_MH$open == grid_MH$player] <- FALSE  # condition 2

#> # A tibble: 27 × 4
#>     car player  open possible
#>   <int>  <int> <int> <lgl>   
#> 1     1      1     1 FALSE   
#> 2     1      1     2 TRUE    
#> 3     1      1     3 TRUE    
#> 4     1      2     1 FALSE   
#> 5     1      2     2 FALSE   
#> 6     1      2     3 TRUE    
#> # … with 21 more rows
  1. Eliminating impossible cases

Next, we remove the impossible cases and consider only possible ones (i.e., possible == TRUE):

grid_MH_1 <- grid_MH[grid_MH$possible == TRUE, ]

#> # A tibble: 12 × 4
#>     car player  open possible
#>   <int>  <int> <int> <lgl>   
#> 1     1      1     2 TRUE    
#> 2     1      1     3 TRUE    
#> 3     1      2     3 TRUE    
#> 4     1      3     2 TRUE    
#> 5     2      1     3 TRUE    
#> 6     2      2     1 TRUE    
#> # … with 6 more rows

Note that we used base R functions to add a new variable (possible), determine its values (by logical indexing) and then filter its rows (by logical indexing) to leave only possible cases in grid_MH_1.

The following tidyverse solution replaces the last two steps by a simple dplyr pipe (and would have worked without the additional possible variable):

grid_MH_1 <- grid_MH %>%
  filter(open != car, open != player)  # condition 1 and 2

#> # A tibble: 12 × 4
#>     car player  open possible
#>   <int>  <int> <int> <lgl>   
#> 1     1      1     2 TRUE    
#> 2     1      1     3 TRUE    
#> 3     1      2     3 TRUE    
#> 4     1      3     2 TRUE    
#> 5     2      1     3 TRUE    
#> 6     2      2     1 TRUE    
#> # … with 6 more rows

ToDo: Visualize the 12 possible cases in a 3x3x3 cube.

  1. Determine wins by strategy

Finally, we can determine the number of wins, based on the player’s strategy (i.e., staying with the original choice, or switching to the other remaining door):

  • Win by staying: Staying wins whenever the location of the car matches the initial choice of the player.

  • Win by switching: As we are only left with possible cases, switching wins whenever staying fails to win.

grid_MH_1$win_stay   <- FALSE  # initialize
grid_MH_1$win_switch <- FALSE

grid_MH_1$win_stay[grid_MH_1$car == grid_MH_1$player] <- TRUE  # win by staying
grid_MH_1$win_switch[grid_MH_1$win_stay == FALSE]     <- TRUE  # win by switching 

#> # A tibble: 12 × 6
#>     car player  open possible win_stay win_switch
#>   <int>  <int> <int> <lgl>    <lgl>    <lgl>     
#> 1     1      1     2 TRUE     TRUE     FALSE     
#> 2     1      1     3 TRUE     TRUE     FALSE     
#> 3     1      2     3 TRUE     FALSE    TRUE      
#> 4     1      3     2 TRUE     FALSE    TRUE      
#> 5     2      1     3 TRUE     FALSE    TRUE      
#> 6     2      2     1 TRUE     TRUE     FALSE     
#> # … with 6 more rows

Of course we could have used dplyr mutate() commands, rather than base R indexing to initialize and define these win_stay() and win_switch variables. (We will show a tidyverse solution below.)

  1. Count wins by strategy

To compare both strategies, we can now count how often each strategy wins:

# Result*(erroneous): 
(n_win_stay   <- sum(grid_MH_1$win_stay))    # win by staying
#> [1] 6
(n_win_switch <- sum(grid_MH_1$win_switch))  # win by switching
#> [1] 6
  1. Interpret solution

This analysis suggests that the chances are 50:50 (or rather 6:6) — which is wrong!

But where is the error? And can we fix this simulation and reach the right solution by enumerating cases?

Analysis: What went wrong?

The previous simulation illustrates a crucial problem with enumerating cases: The cases noted in the reduced grid_MH_1 are not equiprobable (i.e., expected to appear with an equal probability). To see this, let’s take another look at our reduced grid_MH_1:

# Re-inspect grid_MH_1: 
grid_MH_1 %>% filter(car == 1)     # car behind door 1:
#> # A tibble: 4 × 6
#>     car player  open possible win_stay win_switch
#>   <int>  <int> <int> <lgl>    <lgl>    <lgl>     
#> 1     1      1     2 TRUE     TRUE     FALSE     
#> 2     1      1     3 TRUE     TRUE     FALSE     
#> 3     1      2     3 TRUE     FALSE    TRUE      
#> 4     1      3     2 TRUE     FALSE    TRUE
grid_MH_1 %>% filter(player == 1)  # player picks door 1:
#> # A tibble: 4 × 6
#>     car player  open possible win_stay win_switch
#>   <int>  <int> <int> <lgl>    <lgl>    <lgl>     
#> 1     1      1     2 TRUE     TRUE     FALSE     
#> 2     1      1     3 TRUE     TRUE     FALSE     
#> 3     2      1     3 TRUE     FALSE    TRUE      
#> 4     3      1     2 TRUE     FALSE    TRUE

Although the car is located equally often overall behind each door (i.e, in 4 out of 12 cases) and the player initially chooses each door with an equal chance (i.e, in 4 out of 12 cases), the combinations of car and player show an odd regularity that is not motivated by the original problem. Specifically, the three possible cases in which car == player appear twice in the table, but really describe the same original configuration of the game. Given the problem description, each case for which car != player should be as likely as each case for which car == player. However, our analysis doubled those cases in which Monty had two possible options for opening a door (i.e., the cases of car == player). Put another way, the set of cases in grid_MH_1 assumes that the player — by some prescient miracle — is twice as likely to choose the door with the car than to choose a different door. As staying is the successful strategy in this case, this analysis overestimates the overall success rate of staying with the initial choice.

By selecting cases that were possible, we implicitly conditionalized the number of cases on the number of Monty’s options for opening a door. This is essentially the same error that people intuitively make when assuming there is a 50:50 chance after Monty has opened a door. Somewhat ironically, this error was obscured further by disguising it as a seemingly sophisticated simulation!

The lesson to learn here is not that simulations are bad, but that they must not be trusted blindly. As simulations explicate assumptions and the consequences of mechanisms that we cannot solve analytically, the results from a simulation are only valid if its assumptions are sound and its mechanism is implemented in a faithful manner. Crucially, the error here was not caused by our method (i.e, using a particular type of simulation), but in an inappropriate conceptualization of the problem (assuming that all possible cases were equiprobable). So how could we create a correct simulation by enumeration?

Correction 1

Actually, a correct solution is even simpler than our initial one.

In the following alternative, we only consider possible cases prior to Monty’s actions:

# space of only 9 options:
grid_MH_2 <- expand_grid(car, player)
# names(grid_MH_2) <- c("car", "player")
# grid_MH_2 <- tibble::as_tibble(grid_MH_2) %>% arrange(car)

#> # A tibble: 9 × 2
#>     car player
#>   <int>  <int>
#> 1     1      1
#> 2     1      2
#> 3     1      3
#> 4     2      1
#> 5     2      2
#> 6     2      3
#> # … with 3 more rows

Importantly, given the problem description, these 9 cases actually are equi-probable! And since Monty can always open some door (as at least one of the non-chosen doors does not contain the car), we do not even need to consider his action to determine our chances of winning by staying or switching. In fact, we can count all cases of wins, based on strategy, exactly as before — independently of Monty’s actions:

grid_MH_2$win_stay   <- FALSE  # initialize
grid_MH_2$win_switch <- FALSE  # initialize

grid_MH_2$win_stay[grid_MH_2$car == grid_MH_2$player] <- TRUE  # from 1
grid_MH_2$win_switch[grid_MH_2$win_stay == FALSE]     <- TRUE  # from 2

#> # A tibble: 9 × 4
#>     car player win_stay win_switch
#>   <int>  <int> <lgl>    <lgl>     
#> 1     1      1 TRUE     FALSE     
#> 2     1      2 FALSE    TRUE      
#> 3     1      3 FALSE    TRUE      
#> 4     2      1 FALSE    TRUE      
#> 5     2      2 TRUE     FALSE     
#> 6     2      3 FALSE    TRUE      
#> # … with 3 more rows

# Result: 
(n_win_stay   <- sum(grid_MH_2$win_stay))    # win by staying
#> [1] 3
(n_win_switch <- sum(grid_MH_2$win_switch))  # win by switching
#> [1] 6

This yields the correct solution: \(p(win\ by\ staying) = 3/9 = 1/3\), but \(p(win\ by\ switching) = 6/9 = 2/3\).

In case this result seems odd or counter-intuitive: Contrast the original problem with an alternative one in which Monty opens a door prior to the player’s initial choice (see Monty first below). In the Monty first case, the player really faces a 50:50 chance (and will often prefer to stick with his or her initial choice, which requires an additional psychological explanation). What is different in the original version, in which the player chooses first? The difference is that — when opening a door — Monty Hall always needs to avoid the car. But in the original problem, the host also needs to conditionalize his action on the player’s initial choice. Thus, Monty Hall adds information to the problem. Whereas the chance of winning by sticking with the initial choice remains fixed at \(p(win\ by\ staying) = 3/9 = 1/3\), the chance of winning by switching to the other door is twice as high: \(p(win\ by\ switching) = 6/9 = 2/3\).

A tidyverse variant

The expand.grid() functions above (and its tidyr variant expand_grid()) create all combinations of its constituent vectors with repetitions. This generated many impossible cases, which we eliminated by imposing the constraints of the problem.

However, by only taking into account the location of the car, our setup above ignored that there are two goats (e.g., g_1 and g_2) and two distinct arrangements of them, given a specific car location:

car <- 1:3
g_1 <- 1:3
g_2 <- 1:3

# All 27 combinations:
(t_all <- expand_grid(car, g_1, g_2))
#> # A tibble: 27 × 3
#>     car   g_1   g_2
#>   <int> <int> <int>
#> 1     1     1     1
#> 2     1     1     2
#> 3     1     1     3
#> 4     1     2     1
#> 5     1     2     2
#> 6     1     2     3
#> # … with 21 more rows

# Eliminate impossible cases:
t_all %>%
  filter((car != g_1), (car != g_2), (g_1 != g_2))
#> # A tibble: 6 × 3
#>     car   g_1   g_2
#>   <int> <int> <int>
#> 1     1     2     3
#> 2     1     3     2
#> 3     2     1     3
#> 4     2     3     1
#> 5     3     1     2
#> 6     3     2     1

Importatnly, this shows that there exist 6 equiprobable setups, rather than just 3 possible car locations: Each car location corresponds to two possible arrangements of the goats.

This raises the question:

  • Can we solve the problem by taking into account the goat locations?

Let’s repeating the above simulation with a more complicated starting grid:

# Table/space of all 27 x 9 options:
grid_MH_3 <- expand_grid(car, g_1, g_2, player, open)

# Eliminate impossible cases:
grid_MH_3 <- grid_MH_3 %>%
  filter((car != g_1), (car != g_2), (g_1 != g_2)) %>%  # impossible setups
  filter(open != car, open != player)                   # condition 1 and 2

Note that this leaves 24 cases, rather than the 12 cases of grid_MH_1 above.

This time, we determine the wins by each strategy by using dplyr mutate() commands:

grid_MH_3 <- grid_MH_3 %>%
  # mutate(win_stay = FALSE,        # initialize
  #        win_switch = FALSE) %>%
  mutate(win_stay = (car == player),  # determine wins
         win_switch = (win_stay == FALSE))

This yields the following results:

(n_win_stay   <- sum(grid_MH_3$win_stay))    # win by staying
#> [1] 12
(n_win_switch <- sum(grid_MH_3$win_switch))  # win by switching
#> [1] 12

which — as staying and switching appear to be equally successful — is wrong again!

Why? Because we made exactly the same error as above: By conditionalized our (larger) number of cases based on Monty’s options for opening doors, we ended up with cases that are no longer equiprobable.

Before we correct the error, note that we could have combined all steps of this simulation in a single pipe:

# S1: Simulation in 1 pipe:
expand_grid(car, g_1, g_2, player, open) %>%
  filter((car != g_1), (car != g_2), (g_1 != g_2)) %>%  # eliminate impossible cases
  filter(open != car, open != player) %>%               # impose conditions 1 and 2
  mutate(win_stay = (car == player),                    # determine wins
         win_switch = (win_stay == FALSE)) %>%
  summarise(stay_wins = sum(win_stay),                  # summarise results
            switch_wins = sum(win_switch))
#> # A tibble: 1 × 2
#>   stay_wins switch_wins
#>       <int>       <int>
#> 1        12          12

Chunking all into one pipe allows us to easily change our simulation. Here, we can easily correct our error by simply ignoring (or commenting out) all of Monty’s actions of opening doors:

# S2: Simulation ignoring all of Monty's actions:
expand_grid(car, g_1, g_2, player, open) %>%
  filter((car != g_1), (car != g_2), (g_1 != g_2)) %>%  # eliminate impossible cases
  # filter(open != car, open != player) %>%             # DO NOT impose conditions 1 and 2
  mutate(win_stay = car == player,                      # determine wins
         win_switch = win_stay == FALSE) %>%
  summarise(stay_wins = sum(win_stay),                  # summarise results
            switch_wins = sum(win_switch))
#> # A tibble: 1 × 2
#>   stay_wins switch_wins
#>       <int>       <int>
#> 1        18          36

The simplicity of constructing a simulation pipe also allows us to test other variants. For instance, we could ask ourselves:

  • Would only using the condition that Monty cannot open the door with the car (but ignoring that Monty cannot open the door initially chosen by the player) yield the correct solution?

  • Would only using the condition that Monty cannot open the door chosen by the player (but ignoring that Monty cannot open the door with the car) yield the correct solution?

We can easily check this by separating condition 1 and 2 (and only using one of them):

# S3a: Impose only condition 1:
expand_grid(car, g_1, g_2, player, open) %>%
  filter((car != g_1), (car != g_2), (g_1 != g_2)) %>%  # eliminate impossible cases
  filter(open != car) %>%                               # DO impose condition 1
  # filter(open != player) %>%                          # BUT NOT condition 2
  mutate(win_stay = car == player,                      # determine wins
         win_switch = win_stay == FALSE) %>%
  summarise(stay_wins = sum(win_stay),                  # summarise results
            switch_wins = sum(win_switch))
#> # A tibble: 1 × 2
#>   stay_wins switch_wins
#>       <int>       <int>
#> 1        12          24

# S3b: Impose only condition 2:
expand_grid(car, g_1, g_2, player, open) %>%
  filter((car != g_1), (car != g_2), (g_1 != g_2)) %>%  # eliminate impossible cases
  # filter(open != car) %>%                             # DO NOT impose condition 1
  filter(open != player) %>%                            # BUT condition 2
  mutate(win_stay = car == player,                      # determine wins
         win_switch = win_stay == FALSE) %>%
  summarise(stay_wins = sum(win_stay),                  # summarise results
            switch_wins = sum(win_switch))
#> # A tibble: 1 × 2
#>   stay_wins switch_wins
#>       <int>       <int>
#> 1        12          24

The answer in both cases is yes — we obtain a correct solution. However, both of these last simulations contain cases that could not occur in any actual game: Monty Hall would either open doors chosen by the player (in S3a) or open doors containing the car (in S3b). Thus, these simulations would yield correct solutions, but still not provide a faithful representation of the problem:

  • In the MHD, simulations that ignore all of Monty’s actions or only impose one of two conditions on his actions for opening a door yield the correct result (i.e., always switching is twice as likely to win the car as staying). However, using both of these conditions to eliminate possible cases (i.e., taking into account what he would actually do on any particular game) yields an intuitively plausible, but incorrect solution.

Overall, these explorations enable an important insight:

  • There are always multiple simulations possible. Not only are there many different ways of getting a wrong solution, but also different ways of obtaining a correct solution. Obtaining a correct result is no guarantee that a simulation (i.e., the process of using a particular method or strategy) was correct as well.

This is a fundamental point, that may seem quite trivial when illustrating it with a simpler problem: Stating that the sum of \(1+1\) is 2 does not guarantee that we actually know how to add numbers. What if we had memorized the answer? Does this mean that memorized answers are always correct? What if we actually multiplied both addends (i.e., \(1\cdot1\)) and then doubled the result (i.e., \(2\cdot 1\)) to obtain their sum? This seems absurd, but would yield the correct answer in this particular case (but wrong answers for \(1+2\)). Simulations often suffer from similar complications: They typically recruit complex machinery to answer quite basic questions. If we do not understand how they work, they can easily lead to erroneous results or to correct results for the wrong reasons. Thus, to productively use and enable trust in simulations, it is essential that they are transparent (i.e., we also understand how they work).

Combinations vs. permutations

An alternative conceptualization of the problem setup could start with the three objects, but rather than expanding all possible combinations and then eliminating cases, we could only enumerate actually possible cases by arranging them (i.e., all their permutations without repetitions).

Start with the three objects:

objects <- c("car", "goat 1", "goat 2")

# Function for all permutations:
# See <https://blog.ephorie.de/learning-r-permutations-and-combinations-with-base-r> 
# for explanation of recursion!
perm <- function(v) {
  n <- length(v)
  if (n == 1) v
  else {
    X <- NULL
    for (i in 1:n) X <- rbind(X, cbind(v[i], perm(v[-i])))

# 6 possible arrangements:
#>      [,1]     [,2]     [,3]    
#> [1,] "car"    "goat 1" "goat 2"
#> [2,] "car"    "goat 2" "goat 1"
#> [3,] "goat 1" "car"    "goat 2"
#> [4,] "goat 1" "goat 2" "car"   
#> [5,] "goat 2" "car"    "goat 1"
#> [6,] "goat 2" "goat 1" "car"

Note: See the blog post on Learning R Permutations and Combinations at Learning Machines for the distinction between combinations and permutations.

This shows that there are only 6 initial setups of the objects.

Variations of the MHD

Several variants of the MHD can help to render its premises and solution more clear. Thus, use what you have learned above to adapt the simulation in the following ways:

  1. More doors: Extend your simulation to a version of the problem with 100 doors. After the player picks an initial door, Monty opens 98 of the other doors (all revealing goats) before you need to decide whether to stay or switch. How does this affect the winning probabilities for staying vs. switching?

Hint: As it is impractical to represent a game’s setup with 100 columns, this version requires a more abstract representation of the environment and/or Monty Hall’s actions. (For instance, the setup could be expressed by merely noting the car’s location…)

  1. Monty’s mind: The classic version of the problem crucially assumes that Monty knows the door with the prize and curates the choices of the available doors correspondingly. Determine the player’s winning probabilities for staying vs. switching for the following alternative versions of the game:

    • Monty first: The host first opens a door to reveal a goat, before the player selects one of the closed doors as her initial choice. The player then is offered an option to switch. Should she do so?

    • Ignorant Monty: The host does not know the car’s location and opens a random door to reveal either a goat or the car. Should the player switch whenever a goat is revealed?

    • Angelic Monty: The host offers the option to switch only when the player has chosen incorrectly. What and how successful is the player’s best strategy?

    • Monty from Hell: The host offers the option to switch only when the player’s initial choice is the winning door. What and how successful is the player’s best strategy?

Additional variations

Many additional versions of the MHD exist. For instance, the three prisoners problem (see below), Bertrand’s box paradox, and the boy or girl paradox) (see Bar-Hillel & Falk, 1982 for psychological examinations.)

15.5.4 The sock drawer puzzle

The puzzles from the classic book Fifty challenging problems in probability (by Mosteller, 1965) can be solved analytically (Figure 15.5 shows the book’s cover). However, if our mastery of the probability calculus is a bit rusty, we can also find solutions by simulation.

Fifty challenging problems in probability (by Mosteller, 1965) is a slim book, but can provide many hours of frustration or delight.

Figure 15.5: Fifty challenging problems in probability (by Mosteller, 1965) is a slim book, but can provide many hours of frustration or delight.

The problem

  1. Use a simulation for solving The sock drawer puzzle (by Mosteller, 1965, p. 1):
The sock drawer puzzle (i.e., Puzzle 1 by Mosteller, 1965).

Figure 15.6: The sock drawer puzzle (i.e., Puzzle 1 by Mosteller, 1965).

  1. Use an adapted simulation for solving the following variation of the puzzle:

A drawer contains red socks and black socks.
When two socks are drawn at random, the probability that both are of the same color is \(\frac{1}{2}\).
(c) How small can the number of socks in the drawer be?
(d) What are the first solutions with an odd number of red socks and an even number of black socks?

Note that the question in (c) is identical to (a), but the second sentence of the introduction has slightly changed: Rather than asking for a pair of red socks, we now require that the probability for both socks of a pair having the same color is \(\frac{1}{2}\).

1. Solution to (a)

The following preliminary thoughts help to explicate the problem and discover some constraints:

  • The puzzle demands that a pair of socks is drawn. Thus, socks in the drawer are drawn at random (e.g., in the dark) and without replacement (i.e., we cannot draw the same sock twice).

  • Given that there are only two colors of socks (red r, vs. black, b), drawing a pair of socks can generally yield four possible outcomes: bb, br, rb, and rr. The events bb and rr can only occur if there are at least two socks of the corresponding color in the drawer.

  • For the probability of the target event rr to be \(\frac{1}{2} = 50\%\), there must be more red socks than black socks in the drawer. Otherwise, the number of events with one or two black socks would outnumber our target event bb.

  • For the target event of rr to occur, there need to be at least two red socks (i.e., \(|r| \geq 2\)).

  • The simplest conceivable composition of the drawer is rrb.

A sampling solution:

  • Drawing without replacement in R: Use the sample() function
# For a given drawer:
drawer <- c("r", "r", "b")

# drawing once:
sample(x = drawer, size = 2, replace = FALSE)
#> [1] "r" "b"

# drawing N times:
N <- 10000
tr <- 0  # count target events "two red"

# loop: 
for (i in 1:N){
  s <- sample(x = drawer, size = 2, replace = FALSE)
  if (sum(s == "r") == 2){  # criterion: 2 red socks
    tr <- tr + 1

# probability of tr:
#> [1] 0.3346

Target event needs to count the number of red socks (i.e., number of times of drawing the element "r"): Test in R: sum(s == "r") must equal 2.

Create a function that abstracts (or encapsulates) some steps:

draw_socks <- function(drawer, N = 1000){
  # initialize: 
  prob <- NA
  tr <- 0  # count target events "two red"
  # loop: 
  for (i in 1:N){
    s <- sample(x = drawer, size = 2, replace = FALSE)
    if (sum(s == "r") == 2){  # criterion: 2 red socks
      tr <- tr + 1
  # probability of tr:
  prob <- tr/N


  • In this function, the size of each draw (size = 2), our event of interest (sum(s == "r")), and the frequency of this event (2) were hard-encoded into the function. If we ever needed more flexibility, they could be provided as additional arguments.

  • The function returns a probability (i.e., a number between 0 and 1). The accuracy of its value depends on the number of samples N. Higher samples will yield more stable estimates of this value.

Some initial plausibility checks:

# Check:
draw_socks(c("r", "g", "b"))
#> [1] 0
draw_socks(c("b", "b", "r"))
#> [1] 0
draw_socks(c("b", "r"))
#> [1] 0
draw_socks(c("r", "r"))
#> [1] 1

## Note errors for:
# draw_socks(c("r"))
# draw_socks(c("b"))

Note that varying the value of N varies the variability of our estimate:

drawer <- c("r", "r", "b")

draw_socks(drawer,    1)  # 10^0
draw_socks(drawer,   10)  # 10^1
draw_socks(drawer,  100)  # 10^2
draw_socks(drawer, 1000)  # 10^3
draw_socks(drawer, 10^6)  # beware of computation times

Note that the range of values theoretically possible can vary between 0 and 1. However, whereas it actually varies between those extremes for N = 1, the chance of repeatedly obtaining such an extreme outcome decreases with increasing values of N. Thus, our estimates get more stable as N increases.

Note also that there is a considerable trade-off between accuracy and speed (due to using a loop for sampling events). Using values beyond N = 10^6 takes considerably longer, without changing our estimates much. Thus, we face a classic dilemma between being “quick and noisy” vs. being “slow and accurate.”

What is the lowest value of N that yields a reasonable accuracy? This raises the question: What is “reasonable?” Actually, this is one of the most important questions in many contexts of data analysis. In the present context, our target probability is specified as the fraction \(\frac{1}{2}\). Does this mean that we would only consider a probability value of exactly \(0.5\) as accurate, but any nearby values (e.g., \(0.495\) or \(.503\)) as incaccurate? Realistically, we want some reasonable stability of the 2nd decimal digit (i.e., the percentage value):

round(draw_socks(drawer, 10^4), 2)  # still varies on the 2nd digit
round(draw_socks(drawer, 10^5), 2)  # hardly varies any more

Our tests show that a default value of \(N = 10^5\) is good enough for (or “satisfies”) our present purposes.

Now we can “solve” the problem for specific drawer combinations. We will initially use a pretty coarse estimate (quick and noisy) and increase accuracy if we reach a promising range:

draw_socks(c("r", "r", "b"), 10^3) 
#> [1] 0.335
draw_socks(c("r", "r", "b", "b"), 10^3) 
#> [1] 0.171
draw_socks(c("r", "r", "r", "b"), 10^3) 
#> [1] 0.475

The last one seems pretty close. Let’s zoom in by increasing our accuracy:

draw_socks(c("r", "r", "r", "b"), 10^5)  # varies around target value of 0.50

1. Solution to (b)

Part (b) imposes an additional constraint on the drawer composition: The number of black socks must be even (which excludes our first solution).

Automating the drawer composition by making it a function of two parameters:

# Two numeric parameters: 
n_r <- 2  # number of red socks
n_b <- 1  # number of black socks

# Function to make drawer (in random order):
make_drawer <- function(n_r, n_b){

  drawer <- c(rep("r", n_r), rep("b", n_b))  # ordered drawer
  sample(drawer, size = n_r + n_b)  # a random permutation


# Checks: 
make_drawer(4, 0)
#> [1] "r" "r" "r" "r"
make_drawer(3, 1)
#> [1] "r" "r" "r" "b"
make_drawer(2, 2)
#> [1] "r" "b" "r" "b"

Note that the sample() function in make_drawer() only creates a random permutation of the drawer elements. If we trust that the sample() function in draw_socks() draws randomly, we could omit permuting the drawer contents.

Combine both function to create a drawer and estimate the probability for N draws from it in one function:

estimate_p <- function(n_r, n_b, N){
  drawer <- make_drawer(n_r, n_b)
  draw_socks(drawer = drawer, N = N)

# Checks:
estimate_p(1, 1, 10)
#> [1] 0
estimate_p(2, 0, 10)
#> [1] 1
estimate_p(2, 2, 100)
#> [1] 0.19
estimate_p(3, 1, 1000)
#> [1] 0.482

Prepare a grid search for a range of specific parameter combinations:

# Parameters: 
n_r <- 2:20  # number of red socks
n_b <- 1:20  # number of black socks

grid <- expand.grid(n_r = n_r, n_b = n_b)
#> [1] 380   2

Importantly, when conducting simulations based on sampling, we should always keep an eye on the computational demands of our simulation.

In the present case, we are considering 380 parameter combinations. If we want to estimate the probability of our target event on the basis of N draws, the number of samples needed overall varies as a function of N. For instance, if the simulation for N = 1,000 takes several seconds, the same simulation for N = 10,000 will take minutes, and aiming for N = 1,000,000 may take hours or days.

To anticipate and avoid computational explosions, some pragmatic considerations are in order. Two simple approaches for addressing this problem include:

  1. Narrow down the set of candidates considered based on a piori constraints.

  2. Identify plausible candidates and increase the accuracy of the analysis only for those candidates.

Using what we know about the problem allows us to narrow down the range of parameters considered in two ways:

# Exclude undesired cases:
grid <- dplyr::filter(grid, n_b %% 2 == 0)  # require even values of n_b
#> [1] 190   2

# Exclude impossible cases:
grid <- dplyr::filter(grid, n_r > n_b)  # require more red than blue socks
#> [1] 90  2

The even-number constraint reduced the number of candidate pairs of parameters in grid by 50% and the second step shaved off another 50% of cases to consider. This still leaves us with 90 parameter combinations to consider.

We can now compute the probability for each drawer combination in grid. However, to still use our computational resouces wisely, the second parameter to play with is the number of random draws N in each simulation. To use our limited computing resources wisely, we adopt a multi-step procedure that incrementally narrows down the set of candidates considered:

  • For speed’s sake, we start with a coarse estimate (i.e., estimate \(p_{e}\) for a relatively low value of N = 1000).
  • Narrow down the set of candidate parameters (by their distance to the target probability of \(p_{t} = \frac{1}{2}\)).
  • Investigate only the set of plausible candiates (e.g., \(.40 \leq p_{e} \leq .60\)) more accurately (with a higher value of N).
  • Repeat.

The following code pipes narrow down the set of candidate parameters in three steps:

# Determine probability for each combination in grid:

# 1. Coarse search for candidates: 
grid2 <- grid %>% 
  mutate(N = 10^3,
         prob = purrr::pmap_dbl(list(n_r, n_b, N), estimate_p), 
         dist = abs(prob - .500)) %>%
  filter(dist <= .10)

# 2. Zoom into the subset of plausible candidates:
grid3 <- grid2 %>%
  select(n_r, n_b) %>%
  mutate(N = 10^4,
         prob = purrr::pmap_dbl(list(n_r, n_b, N), estimate_p), 
         dist = abs(prob - .500)) %>%
  filter(dist < .02)

# 3. Check the remaining candidates with greater accuracy:
grid4 <- grid3 %>%
  select(n_r, n_b) %>%
  mutate(N = 10^6,
         prob = purrr::pmap_dbl(list(n_r, n_b, N), estimate_p), 
         dist = abs(prob - .500)) %>%
  filter(dist < .005)

Note some additional pragmatic considerations: The goal of our step-wise procedure is to identify plausible candidates and investigate those more thoroughly. Importantly, there are two types of errors:

  • Missing solutions occur when we exclude parameters that actually happen to be a true solution,

  • False positives occur when we accept a candidate that actually turns out to be false.

There is a trade-off between both errors: The more liberal (vs. restrictive) we are in accepting candidates, the less (vs. more) likely a miss will occur. Importantly, any miss (i.e., excluded parameters) will never appear again, whereas false positives can still be identified later. Thus, we need to protect ourselves against misses, rather than false positives. This is why the first two steps are more liberal in accepting candidates. This costs us extra computation time, but raises our confidence that we ultimately arrive at correct solutions.

This incremental procedure leaves us with only a few (i.e., typically one or two) results in grid4. To further increase our confidence in these results, we could repeat the procedure several times and/or increase the demands on the maximal distance dist from our target probability \(p_{t} = \frac{1}{2}\)). Ultimately, any solution that we are willing to accept as an answer within a sampling approach will always depend on what we consider to be good enough.

Possible alternatives and improvements

There are many ways for improving (or increasing the speed) of our simulation futher. Some examples include:

  • Use a while loop that increases parameter values (for n_r and n_b) until the computed probability falls within a threshold around the desired value.

  • Use hill climbing heuristics to detect trends in computed values (to exclude going further into the wrong direction).

  • Use numerical optimization methods to find optimal parameter combinations (within constraints).

2. Solution to (c) and (d)

Some preliminary thoughts may help solving the modified puzzle:

  • If the probability for both socks being of the same color is \(p_{same} = \frac{1}{2}\), the the complementary probability that both socks are of different color is \(p_{diff} = \frac{1}{2}\) as well.

  • As the problem asks for socks of the “same color,” any answer will be symmetrical in some sense: If \(m\) red and \(n\) black socks provide a solution for one color (e.g., red), then \(n\) red and \(m\) black socks provide a solution for the other color (e.g., black). This can be used to constrain a grid search in a simulation.

  • This symmetry may motivate the hypotheses that we need a symmetrical or balanced drawer combination and that the drawer containing four socks \(rrbb\) (i.e., two red and two black socks) provides a first solution. However, when enumerating all combinations for drawing a pair of socks out of \(r_{1}r_{2}b_{1}b_{2}\), we quickly see that the possible pairs with two different colors outnumber the two same-color pairs. Thus, our naive hypotheses are misguided and we may actually need an imbalanced drawer to obtain a 50:50 chance for drawing two socks of the same color.

  • Asking that a random pair of socks must be of the same color is less restrictive than asking for two red socks. To implement this less constrained version, we need to adjust our criterion for counting a successful draw in draw_socks().

Analytical solutions

Here are some hints for using R to solve the puzzle in an analytical fashion (i.e., using the mathematics provided by probability theory):

For the original puzzle (a) and (b):

The probability for randomly drawing a red pair is

\(P(rr) = \binom{r}{2} / \binom{r+b}{2}\)

Implementation in R with the choose() function:

# Function: 
prob_rr <- function(n_r, n_b){
  # Frequencies: 
  n_rr  <- choose(n_r, 2)
  n_all <- choose(n_r + n_b, 2)

  # Probability: 
  p_rr <- n_rr/n_all

# Check:
prob_rr( 1, 1)
prob_rr( 2, 2)
prob_rr( 3, 1)  # 1st solution
prob_rr(10, 4)  # close candidate
prob_rr(15, 6)  # 2nd solution
prob_rr(20, 8)  # close candidate

For the puzzle variations (c) and (d):

The probability for both socks in a random pair being of the same color is:

\(P(sc) = (\binom{r}{2} + \binom{b}{2}) / \binom{r+b}{2}\)

Implementation in R:

# Function: 
prob_sc <- function(n_r, n_b){
  # Frequencies: 
  n_rr  <- choose(n_r, 2)
  n_bb  <- choose(n_b, 2)
  n_all <- choose(n_r + n_b, 2)

  # Probability: 
  p_sc <- (n_rr + n_bb)/n_all

# Check:
prob_sc( 1, 1)
prob_sc( 2, 2)   # etc. 

# Possible solutions:
prob_sc( 3, 6)   # 1st solution
prob_sc(15, 10)  # 2nd solution
prob_sc(21, 28)  # 3rd solution

Given these analytic (and fast) formulas, we can conduct complete grid searches for a much larger grid:

# Grid search (for much larger grid):

# Parameters: 
n_r <- 1:1000  # number of red socks
n_b <- 1:500   # number of black socks

grid <- expand.grid(n_r = n_r, n_b = n_b)
dim(grid)  # Note the number of cases (rows)!

# (a)+(b) Drawing two red socks: 
grid_rr <- grid %>% 
  mutate(p_rr = purrr::pmap_dbl(list(n_r, n_b), prob_rr),
         dist = abs(p_rr - .500)) %>%
  filter(dist < .000001) # OR: 
  # filter(dplyr::near(dist, 0))

knitr::kable(grid_rr, digits = 3, caption = "Probability of 2 red socks = 1/2.")

# (c) Both socks having the same color:
grid_sc <- grid %>% 
  mutate(p_sc = purrr::pmap_dbl(list(n_r, n_b), prob_sc),
         dist = abs(p_sc - .500)) %>%
  filter(n_r %% 2 == 1, n_b %% 2 == 0, dist < .000001) # OR: 
  # filter(dplyr::near(dist, 0))

knitr::kable(grid_sc, digits = 3, caption = "Probability of same color = 1/2.")

15.5.5 Related probability puzzles

Solve the following puzzles (also from Mosteller, 1965) by conducting similar simulations:

The Successive wins puzzle (i.e., Puzzle 2 by Mosteller, 1965).

Figure 15.7: The Successive wins puzzle (i.e., Puzzle 2 by Mosteller, 1965).

A possible solution (by enumerating cases):

# Possible events: ---- 
father   <- c("win", "lose")
champion <- c("win", "lose")

# Order 1: ----

# Possible cases:
FCF <- expand.grid(father, champion, father)
names(FCF) <- c("father_1", "champion", "father_2")

# Condition: Winning 2 consecutive sets:
win_FCF <- FCF %>%
  filter(champion == "win", father_1 == "win" | father_2 == "win")

# Order 2: ----

# Possible cases:
CFC <- expand.grid(champion, father, champion)
names(CFC) <- c("champion_1", "father", "champion_2")

# Condition: Winning 2 consecutive sets:
win_CFC <- CFC %>%
  filter(father == "win", champion_1 == "win" | champion_2 == "win")

# Comparison of winning cases: ----
#>   father_1 champion father_2
#> 1      win      win      win
#> 2     lose      win      win
#> 3      win      win     lose
#>   champion_1 father champion_2
#> 1        win    win        win
#> 2       lose    win        win
#> 3        win    win       lose

# Inspection:
# - Rows 2 and 3 of both tables involve events with the same probability
#   (i.e., winning once against the champion and the father, in 2 orders).
# - However, the probability of Row 1 differs across tables:
#   Winning twice against the father is more likely than winning twice against the chamption. 
#   Hence, playing FCF increases Elmer's winning chances beyond playing CFC. 
The flippant juror puzzle (i.e., Puzzle 3 by Mosteller, 1965).

Figure 15.8: The flippant juror puzzle (i.e., Puzzle 3 by Mosteller, 1965).

The prisoner’s dilemma puzzle (i.e., Puzzle 13 by Mosteller, 1965).

Figure 15.9: The prisoner’s dilemma puzzle (i.e., Puzzle 13 by Mosteller, 1965).

Hint: Note the structural similarity of The prisoner’s dilemma puzzle to the Monty Hall dilemma.

15.5.6 Coin tossing reloaded

This exercise involves one of the most famous phenomena in the psychology of human judgment and decision making. The question How do people perceive random events? is the topic of a heated debate and countless publications in the most prestigious journals.

The (mis-)perception of randomness?

A popular task presented in this context is the following:

  • Which of the following sequences is more likely when repeatedly tossing a fair coin (with two possible outcomes: heads H or tails T):

    • (a) H H H H
    • (b) H T H T
    • (c) H H H T

Most people intuitively think that regularity is positively correlated to rarity. As sequences (a) and (b) both seem somewhat regular, the more irregular sequence (c) seems more likely than (b), and both (b) and (c) appear more likely than (a). However, the intended solution of those who typically ask the question (e.g., Kahneman & Tversky, 1972b) is: “All three sequences are equally likely.” When contrasted with this solution, the intuitive answer appears to illustrate a bias in the human perception of random events.

The question whether human perception of chance or randomness is biased or adaptive, is a controversial issue in psychology. Its answer depends — as always — on what exact assumptions we make about the question or its premises. Given appropriate assumptions, seemingly adaptive or biased distortions can even emerge as two sides of the same coin. In this exercise, we will develop two sampling models to argue for both positions and explicate how their premises and conclusions differ.

The standard argument

The standard recipe for demonstrating a bias in human cognition proceeds as follows:

  1. Task: Present some seemingly simple puzzle.
  2. Data: Collect “evidence” to show that people get it wrong.
  3. Solution: Present some normative model that seems to solve the puzzle.
  4. Conclude: People are biased.
  5. Explanation: Make up some evolutionary just-so story to “explain” why people are biased.

This recipe is widely used (e.g., see Wikipedia’s list of cognitive biases) and — if we manage to ignore the logical flaw that many “biases” come in two opposite varieties — highly influential and successful (in terms of publishing papers or winning prizes). Steps 1–4 are necessary for scoring an effect in an audience or publishing a paper (even though the “evidence” only adds some optional spice, as anyone looking at the puzzle will intuitively know how people will respond). The “explanation” in Step 5 is optional, but helps for a more serious impression or scientific flavor.

The people primarily interested in demonstrating (or “coining”) some bias usually stop here. But rather than just showing a curious effect, the more interesting questions are:

  • Why do people seem to get this wrong?
  • What are the reasons that humans have evolved in this particular direction?

It’s always easy to answer this question concerning the adaptive nature of the phenomenon by an evolutionary “just so” story (e.g., one that involves two systems, one of which fails to monitor the other)17. However, if we think carefully about the issue, it often turns out that there are good reasons for a seemingly distortive mechanism. These reasons typically have to do with systematic structures in our environment and the adaptive nature of cognition picking up these regularities.

The normative model used in Step 3 of the recipe typically appears as some mathematical formula (e.g., Bayes’ theorem) or some other form of probability theory. That’s also where R and simulations by sampling can come into play. The standard way of demonstrating the correct answer to queries about the probability of a sequence is the combinatorial tree representing all sequences of a given length. For instance, the blog post The Solution to my Viral Coin Tossing Poll (of the Learning Machines blog, on 2021-04-28) provides the following R code (adapted to our simpler version of the problem):

# Events/patterns of interest: 
a <- c("H", "H", "H", "H")
b <- c("H", "T", "H", "T")
c <- c("H", "H", "H", "T")

# Generate all possible cases (of length k=4):
k <- 4
all_cases <- expand.grid(rep(list(c("H", "T")), k))

# As matrix:
mx <- as.matrix(all_cases)  
# t(mx)  # transposes mx

# Count: 
sum(colSums(t(mx) == a) == k)  # how many rows of mx equal a?
#> [1] 1
sum(colSums(t(mx) == b) == k)  # how many rows of mx equal b?
#> [1] 1
sum(colSums(t(mx) == c) == k)  # how many rows of mx equal c?
#> [1] 1

Interpretation: This code provides a solution to the problem by first enumerating all possible cases and then counting how often each pattern of interest occurs in them. The key step here consists in using the expand.grid() function to generate all possible sequences of events H and T with a length of \(k=4\) (i.e., the combinatorial tree). This grid is then turned into a matrix (a step that could be skipped). The sum() statements count the number of cases in which the columns of the transposed matrix exactly match our sequences of interest (a, b, or c). However, if one understands what expand.grid() does, this counting step is also not needed (but may be helpful when considering longer sequences).

Overall, the code appears to demonstrate that — given all possible series with H and T of length 4 — each sequence of interest occurs exactly once. Hence, it is concluded that all three sequences of interest are equally likely (i.e., occurring with a probability of \(P(a) = P(b) = P(c) = \frac{1}{16}\)). The silent premise made here is: A fair coin is thrown exactly \(k=4\) times, but the probability of each possible outcome corresponds to the long-term outcomes if this process was repeated many (or an infinite number of) times.

Your tasks are to conduct two simulations with slightly different setups:

  1. The code above enumerated the number of possible coin sequences of length \(k=4\). Alternatively, we can simulate an “experiment” that tosses a fair coin many times (e.g., observe 1 million tosses) and then count the actual occurrence of each event (a, b, or c) in this sequence. What would we conclude from this procedure/measure?

  2. A critic claims that people never experience infinite random events (like 1 million tosses of a coin). Instead, they experience many shorter sequences (e.g., 50,000 sequences of length 20). What matters for the human experience of a pattern’s frequency is how many of these sequences contain the event in question. This can be quantified by the average wait time for each event or the probability of non-occurrence of each event in the set of all sequences. What would we conclude from this procedure/measure?

Conduct a simulation for each of these points and compare their results.

Argument by Simulation 1

Strategy: Sample a very long sequence of events and count number of occurrences:

  • Generate one very long series of (e.g., 1 million) random coin tosses

  • Optional: Convert the series and the events of interest into strings (to facilitate pattern matching).

  • Iterate through the sequence and ask: How often does each sequence occur?

We first use sample() to simulate 1 million tosses of a fair coin:

tosses <- sample(c("H", "T"), size = 10^6, replace = TRUE, prob = c(.5, .5))
# tosses

To faciliate finding our patterns of interest in tosses, we convert all into strings:

# Convert to strings:
toss_s <- paste(tosses, collapse = "")
# toss_s
#> [1] 1000000

# Events (as strings):
(a_s <- paste(a, collapse = ""))
#> [1] "HHHH"
(b_s <- paste(b, collapse = ""))
#> [1] "HTHT"
(c_s <- paste(c, collapse = ""))
#> [1] "HHHT"

As a quick check of pattern matching (using regular expressions) reveals that the corresponding counts provided by standard functions would be wrong, as they would fail to count overlapping matches:

gregexpr(pattern = "HH", text = "HHHHH")
stringr::str_locate_all(string = "HHHH", pattern = "HH")

Thus, we write an iterative function count_occurrences() that moves a window of a pattern size n_p over the entire width of a text string n_t and counts any match it encounters:

# Count number of substring occurrences (including overlapping ones):
count_occurrences <- function(pattern, text){
  count <- 0  # initialize 
  n_p <- nchar(pattern)  # width of pattern
  n_t <- nchar(text)     # width of text 
  for (i in (1:(n_t - n_p + 1))){
    # Current candidate: 
    candidate <- substr(text, start = i, stop = (i + n_p - 1))
    if (candidate == pattern){  # pattern is found:  
      count <- count + 1        # increment counter 
  } # for end. 

# Check:
count_occurrences("XX", "X_X")
#> [1] 0
count_occurrences("XX", "XX_X")
#> [1] 1
count_occurrences("XX", "XX_XX")
#> [1] 2
count_occurrences("XX", "XXX")
#> [1] 2
count_occurrences("XX", "XXXXX")
#> [1] 4

Our checks verify that count_occurrences() also counts overlapping patterns. We can use this function to count the number of times each pattern of interest occurs in our tosses (both as strings):

# Counts:
count_occurrences(a_s, toss_s)  # counts for a
#> [1] 62793
count_occurrences(b_s, toss_s)  # b
#> [1] 62302
count_occurrences(c_s, toss_s)  # c
#> [1] 62569

Interpretation: The counts indicate that three events appear about equally often in our sample (with minor differences being due to random fluctuations). Hence, we conclude that the three patterns of events/sequences are equally likely.

Counter-argument by Simulation 2

Strategy: Sample a large number of shorter sequences and count the number/probability of non-occurrence for each event of interest:

  1. Assume that people naturally encounter many finite sequences (e.g., 50,000 sequences of length 20), rather than one infinite one (Hahn & Warren, 2009).

  2. What matters for our natural experience of patterns may be the average waiting time (AWT) of an event (Gardner, 1988, p. 61ff.). As sequences consisting entirely of H or of T require clusters of identical events (as HHH can only occur after HH, etc.), the AWT for an event of HHHH is longer than the average wait time for HTHT.

  3. The average wait time corresponds to the probability of non-occurrence of an event in a sequence (of finite length).

Note that the 1st point is an assumption regarding our natural environments: We experience many finite sequences of events, rather than infinite sequences. The 2nd point proposes an alternative normative theory (or a notion of probability in terms of AWT). The 3rd point is merely technical: We can measure a particular event’s AWT as its probability of non-occurrence in a large number of finite sequences.

To explore the consequences of these assumptions, we use sample() to simulate a much shorter sequence of 20 tosses of a fair coin, but collect 50,000 such sequences in a vector toss_s (representing each toss as a string of characters):

# Tossing a fair coin 10 times (as string): 
n_toss <- 20
toss <- sample(c("H", "T"), size = n_toss, replace = TRUE, prob = c(.5, .5))
(toss_s <- paste(toss, collapse = ""))

# Sample many finite sequences:
n_seq  <- 10^6/n_toss 
toss_s <- rep(NA, n_seq)

for (i in 1:n_seq){
  toss <- sample(c("H", "T"), size = n_toss, replace = TRUE, prob = c(.5, .5))
  toss_s[i] <- paste(toss, collapse = "")  # as string


Note that the total number of coin tosses considered here is the same as in Simulation 1. If we wanted, we could also have recycled our data (from toss_s above) and split it into 5^{4} sequences of length 20 each. If we trust our random generator, sampling new coin tosses here should not make a difference.

At this point, we could be tempted to count the number of occurrences of each pattern of interest in our shorter strings:

## Red herring step: 

# Count number of occurrences of each pattern: 

# Counts (using base::sapply functions):
sum(sapply(X = toss_s, FUN = count_occurrences, pattern = a_s))
#> [1] 52969
sum(sapply(X = toss_s, FUN = count_occurrences, pattern = b_s))
#> [1] 53000
sum(sapply(X = toss_s, FUN = count_occurrences, pattern = c_s))
#> [1] 53111

# Using purrr::map functions:
# sum(purrr::map_dbl(toss_s, count_occurrences, pattern = a_s))
# sum(purrr::map_dbl(toss_s, count_occurrences, pattern = b_s))
# sum(purrr::map_dbl(toss_s, count_occurrences, pattern = c_s))

Note that these counts are considerably lower than those above. This is due to the fact that a very long sequence was cut into many pieces, so that the first and last elements of each sequence can no longer match a pattern of length \(k=4\) (in both directions). However, as we are only interested in comparing the relative counts for each pattern, our conclusion would not change.

Interpretation: It appears that the three events of interest still occur equally often. Hence, we would conclude that the three patterns of events/sequences are equally likely.

However, rather than counting the total number of appearances of a pattern, we wanted to consider the probability of non-occurrence of a pattern in a finite sequence:

## Intended step/strategy:

# Consider average wait times (Gardner, 1988, p. 61f.) or  
# p(non-occurrence) in a string.  
# Ask how often did something NOT occur:

# Counts of non-occurrence (using sapply):
sum(sapply(X = toss_s, FUN = count_occurrences, pattern = a_s) == 0)
#> [1] 26223
sum(sapply(X = toss_s, FUN = count_occurrences, pattern = b_s) == 0)
#> [1] 17960
sum(sapply(X = toss_s, FUN = count_occurrences, pattern = c_s) == 0)
#> [1] 12708

# Using purrr::map functions: 
# sum(purrr::map_dbl(toss_s, count_occurrences, pattern = a_s) == 0)
# sum(purrr::map_dbl(toss_s, count_occurrences, pattern = b_s) == 0)
# sum(purrr::map_dbl(toss_s, count_occurrences, pattern = c_s) == 0)

These counts show some curious differences — and their magnitude seems too large to be due to random fluctuations: As a higher count of non-occurrence corresponds to a lower frequency, the frequency ranks fr of our three events of interest appear as fr(a) < fr(b) < fr(c) (i.e., they point in the intuitive direction).

To investigate this in more detail, we count the frequency of all possible events in mx (i.e., all possible toss outcomes of length \(k=4\)) in our set of coin toss_s:

# Analysis: Compute p-non-occurrence for ALL possible outcomes in mx:

# Initialize:
(nr_events <- nrow(mx))
#> [1] 16
n_no_occurrence <- rep(NA, nr_events)
cur_event <- rep(NA, nr_events)

# Note: Event/outcomes in row 10 of mx: 
# paste(mx[10, ], collapse = "")

# Loop through all possible events of mx:
# (Note: This may take a few seconds/minutes!) 
for (i in 1:nr_events){
  cur_event[i] <- paste(mx[i, ], collapse = "")
  n_no_occurrence[i] <- sum(sapply(X = toss_s, FUN = count_occurrences, 
                                   pattern = cur_event[i]) == 0)

Given that this loop cycles 16 times through 5^{4} sequences of length 20, it may take a few seconds or minutes. After its evaluation, the vector n_no_occurrence contains the number of non-occurrences of every pattern in all sequences. Thus, n_no_occurrence can easily be translated into the probability of non-occurrence for every pattern:

# Probability (of non-occurrence): 
(p_no_occurrence <- n_no_occurrence/n_seq)
#>  [1] 0.52446 0.25442 0.30896 0.25392 0.30624 0.35842 0.30418 0.25526 0.25416
#> [10] 0.30794 0.35920 0.31068 0.25406 0.31004 0.25660 0.52616

Figure 15.10 illustrates the probabilities of non-occurrence for each possible pattern of 4 coin tosses in finite strings of length 20. The dashed vertical line indicates the symmetry axis due to there being two binary outcomes. As any deviations from symmetry would indicate random fluctuations, the degree of the curve’s symmetry allows judging the robustness of our simulated data.

Probability of non-occurrence (of substring patterns in finite strings of length 20) (cf. Figure 3 of Hahn & Warren, 2009).

Figure 15.10: Probability of non-occurrence (of substring patterns in finite strings of length 20) (cf. Figure 3 of Hahn & Warren, 2009).

Interpretation: When representing “probability” as the likelihood of (non-)occurrence in finite sequences, the chance that (a) HHHH does not occur is higher than that (b) HTHT does not occur, and both (a) and (b) are higher than the probability that (c) HHHT does not occur. Expressed in terms of frequency: Finite sequences in which the sub-sequences (c) HHHT or (b) HTHT occur (one or more times) are more frequent than sequences in which (a) HHHH occurs — exactly as we would predict intuitively. Thus, rather than illustrating a cognitive bias, our intuition may accurately reflect the finite way in which we naturally experience random sequences.


Our two simulations seem to support conflicting conclusions: Depending on our underlying notion of probability, human perception of randomness either appears systematically biased or well-adapted to the actual likelihood of the relevant events. This may partly explain why there are huge theoretical debates regarding the biased or adaptive nature of human behavior (here: probability judgments or the perception of random events). Importantly, our simulations also reveal the reasons for these conflicting results and interpretations: We first counted the number of pattern matches in a very long (near-infinite) sequence of coin tosses, before counting the number of patterns not occurring in shorter (finite) sequences of coin tosses. Which simulation we should trust depends on additional assumptions that determine which of these setups and measures makes more sense.


When evaluating the scientific debate with a sober mind, we see that there is no paradox or logical contradiction involved. Instead, the two positions in this debate adopt different perspectives and start from different premises regarding the data and measures that must be considered when evaluating the likelihood of random events. Overall, this shows that carefully conducting formal simulations helps explicating hidden premises and explains seemingly contradictory conclusions.


The simulation that expressed “rarity” in terms of average waiting times and the non-occurrence in finite sequences is based on Hahn & Warren (2009), who point out that different sequences of the same length can have different average waiting times (AWT) and demonstrate that a sequence’s AWT corresponds to its probability of non-occurrence in finite sequences.

However, their key argument is subtle and must not be overinterpreted. For instance, showing that the AWTs (or non-occurrence) of sequences of identical length do vary neither implies that AWTs are a suitable proxy for judging probability, nor that people estimate AWTs when making probabilistic judgments. Especially when comparing two sequences (in a paired comparison task), AWTs do not predict which sequence will occur before another (in a concurrent or competitive sense). This is due to the fact that — when simultaneously waiting for the first of two sequences to occur — we are dealing with dependent and non-transitive probabilities (Gardner, 1988). For a discussion of related issues, see Sun et al. (2010) and Hahn & Warren (2010).


Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44(3), 211–233. https://doi.org/10.1016/0001-6918(80)90046-3
Bar-Hillel, M., & Falk, R. (1982). Some teasers concerning conditional probabilities. Cognition, 11(2), 109–122. https://doi.org/10.1016/0010-0277(82)90021-X
Gardner, M. (1988). Time travel and other mathematical bewilderments. W.H. Freeman.
Hahn, U., & Warren, P. A. (2009). Perceptions of randomness: Why three heads are better than four. Psychological Review, 116(2), 454–461. https://doi.org/10.1037/a0015241
Hahn, U., & Warren, P. A. (2010). Why three heads are a better bet than four: A reply to Sun, Tweney, and Wang (2010). Psychological Review, 117(2), 706–711. https://doi.org/10.1037/a0019037
Kahneman, D., & Tversky, A. (1972a). On prediction and judgement. ORI Research Monographs, 1(12), 430–454.
Kahneman, D., & Tversky, A. (1972b). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3), 430–454. https://doi.org/10.1016/0010-0285(72)90016-3
Mosteller, F. (1965). Fifty challenging problems in probability with solutions. Addison-Wesley.
Neth, H., Gradwohl, N., Streeb, D., Keim, D. A., & Gaissmaier, W. (2021). Perspectives on the 2x2 matrix: Solving semantically distinct problems based on a shared structure of binary contingencies. Frontiers in Psychology, 11, 567817. https://doi.org/10.3389/fpsyg.2020.567817
Savant, M. vos. (1990). Ask Marilyn. Parade Magazine, (September 9), p. 15.
Selvin, S. (1975). A problem in probability. American Statistician, 29, 67. https://doi.org/10.1080/00031305.1975.10479121
Sun, Y., Tweney, R. D., & Wang, H. (2010). Occurrence and nonoccurrence of random sequences: Comment on Hahn and Warren (2009). Psychological Review, 117(2), 697–705. https://doi.org/10.1037/a0018994

  1. In my view, so-called dual systems accounts are serious rivals to Sigmund Freud’s account of three instances (id, ego, super-ego) for the title of the most successful theory in psychology that remains frustratingly unproductive. The key problem of both theories is that they can always explain everything — including the opposite of everything.↩︎