3  Sampling the Imaginary

Learning Objectives:

This chapter teaches the basic skills for working with samples from the posterior distribution. We’ll begin to use samples to summarize and simulate model output. The skills learned here will apply to every problem in the remainder of the book, even though the details of the models and how the samples are produced will vary.

The chapter exploits the fact that people are better in counts than in probabilities. We will take the probability distributions from the previous chapter and sampling from them to produce counts.

Vampire Example

ORIGINAL

Definition 3.1 : Probability Notation

As a repetition and to get a better understanding of the following formulae in this section I read a very basic introduction into probability notation

\(P(A)\) means “Probability Of Event A”.

Probability of an Event

What is the probability of \(2\) after throwing a dice? \[P(A) = \frac{1}{6}\]

\(P(B \mid A)\) means “Probability of Event B given A”.

This is a conditional probability. After event \(A\) has happened, what is the probability of \(B\)? If the events are independent from each other than the changes do not influence each other.

Independent Conditional Events

The probability to throw \(2\) in a second dice throw are still \(P(B \mid A) = \frac{1}{6}\). If the events are dependent of each other then “Probability of event A and event B equals the probability of event A times the probability of event B given event A.”

\[ P(A \space and\space B) = P(A) \times P(B \mid A) \]

Dependent Conditional Events

Typical example is removing marble from a bag without replacement. Let’s take a red marble from a bag of \(5\) red and \(5\) blue marbles without replacement. Here the probability of a red marble (in the second draw) given the probability of a red marble (in the first draw) is

\[ \frac{5}{10} \times \frac{4}{9} = \frac{2}{9} \]

a) Medical Test Scenario with Bayes theorem

“suppose there is a blood test that correctly detects vampirism 95% of the time. In more precise and mathematical notation, \(Pr(\text{positive test result} \mid vampire) = 0.95\). It’s a very accurate test, nearly always catching real vampires. It also make mistakes, though, in the form of false positives. One percent of the time, it incorrectly diagnoses normal people as vampires, \(Pr(\text{positive test result}|mortal) = 0.01\). The final bit of information we are told is that vampires are rather rare, being only \(0.1\%\) of the population, implying \(Pr(vampire) = 0.001\). Suppose now that someone tests positive for vampirism. What’s the probability that he or she is a bloodsucking immortal?” (McElreath, 2020, p. 49) (pdf)

“The correct approach is just to use Bayes’ theorem to invert the probability, to compute \(Pr(vampire|positive)\). The calculation can be presented as:

\[ \begin{align*} Pr(vampire\mid positive) = \frac{Pr(positive\mid vampire) Pr(vampire)}{Pr(positive)}\\ \text{where Pr(positive) is the average probability of a positive test result, that is,}\\ Pr(positive) = Pr(positive \mid vampire) Pr(vampire) \\ + Pr(positive \mid mortal) 1 − Pr(vampire) \end{align*} \tag{3.1}\] ” (McElreath, 2020, p. 49) (pdf)

R Code 3.1 a: Performing vampire calculation with Bayes’ theorem in Base R

Code
## R code 3.1a Vampire ##################
pr_positive_vampire_a <- 0.95
pr_positive_mortal_a <- 0.01
pr_vampire_a <- 0.001
pr_positive_a <- pr_positive_vampire_a * pr_vampire_a +
  pr_positive_mortal_a * (1 - pr_vampire_a)

( 
  pr_vampire_positive_a <- pr_positive_vampire_a * pr_vampire_a / pr_positive_a
)
#> [1] 0.08683729

There is only an 0.0868373% chance that the suspect is actually a vampire.

“Most people find this result counterintuitive. And it’s a very important result, because it mimics the structure of many realistic testing contexts, such as HIV and DNA testing, criminal profiling, and even statistical significance testing (see the Rethinking box at the end of this section). Whenever the condition of interest is very rare, having a test that finds all the true cases is still no guarantee that a positive result carries much information at all. The reason is that most positive results are false positives, even when all the true positives are detected correctly” (McElreath, 2020, p. 49) (pdf)

b) Medical test scenario with natural frequencies

“There is a way to present the same problem that does make it more intuitive” (McElreath, 2020, p. 50) (pdf)

(1)  In a population of 100,000 people, 100 of them are vampires.
(2)  Of the 100 who are vampires, 95 of them will test positive for vampirism.
(3)  Of the 99,900 mortals, 999 of them will test positive for vampirism.

There are 999 + 95 = 1094 people tested positive. But from these people only 95 / (999 + 95) = 8.6837294 % are actually vampires.

Or with a slightly different wording it is still easier to understand:

  1. We can just count up the number of people who test positive: \(95 + 999 = 1094\).
  2. Out of these \(1094\) positive tests, \(95\) of them are real vampires, so that implies:

\[ Pr(positive \mid vampire) = \frac{95}{1094} \]

R Code 3.2 a: Performing vampire calculation with frequencies in Base R

Code
pr_vampire_a2 <-  100 / 100000
pr_positive_vampire_a2 <-  95 / 100
pr_positive_mortal_a2  <-  999 / 99900
pr_positive_a2 <-  95 + 999

(
  pr_vampire_positive_a2 <-  
    pr_positive_vampire_a2 * 100 / pr_positive_a2
)
#> [1] 0.08683729

“The second presentation of the problem, using counts rather than probabilities, is often called the frequency format or natural frequencies. Why a frequency format helps people intuit the correct approach remains contentious. Some people think that human psychology naturally works better when it receives information in the form a person in a natural environment would receive it. In the real world, we encounter counts only. No one has ever seen a probability, the thinking goes. But everyone sees counts (”frequencies”) in their daily lives.” (McElreath, 2020, p. 50) (pdf)

“many scientists are uncomfortable with integral calculus, even though they have strong and valid intuitions about how to summarize data. Working with samples transforms a problem in calculus into a problem in data summary, into a frequency format problem. An integral in a typical Bayesian context is just the total probability in some interval. That can be a challenging calculus problem. But once you have samples from the probability distribution, it’s just a matter of counting values in the interval. An empirical attack on the posterior allows the scientist to ask and answer more questions about the model, without relying upon a captive mathematician. For this reason, it is easier and more intuitive to work with samples from the posterior, than to work with probabilities and integrals directly.” (McElreath, 2020, p. 51) (pdf)

TIDYVERSE

a) Medical Test Scenario with Bayes theorem

Vectors in Base R are tibble columns in tidyverse

Whenever there is a calculation with vectors the pendant in tidyverse mode is to generate columns in a tibble with tibble::tibble() or if there is already a data frame with dplyr::mutate() and to do the appropriate calculation with these columns.

The following R Code 3.1 transformed the Base R calculation R Code 3.3 into a computation using the tidyverse approach.

R Code 3.3 b: Performing the calculation using Bayes’ theorem with tidyverse approach in R

Code
## R code 3.1b Vampire ##########

tibble::tibble(pr_positive_vampire_b = .95,
       pr_positive_mortal_b  = .01,
       pr_vampire_b          = .001) |> 
  dplyr::mutate(pr_positive_b = pr_positive_vampire_b * pr_vampire_b
         + pr_positive_mortal_b * (1 - pr_vampire_b)) |> 
  dplyr::mutate(pr_vampire_positive_b = 
           pr_positive_vampire_b * pr_vampire_b / pr_positive_b) |> 
  dplyr::glimpse()
#> Rows: 1
#> Columns: 5
#> $ pr_positive_vampire_b <dbl> 0.95
#> $ pr_positive_mortal_b  <dbl> 0.01
#> $ pr_vampire_b          <dbl> 0.001
#> $ pr_positive_b         <dbl> 0.01094
#> $ pr_vampire_positive_b <dbl> 0.08683729

b) Medical test scenario with natural frequencies

R Code 3.4 b: Performing the calculation using frequencies with tidyverse approach in R

Code
tibble::tibble(pr_vampire_b2          = 100 / 100000,
       pr_positive_vampire_b2 = 95 / 100,
       pr_positive_mortal_b2  = 999 / 99900)  |>  
  dplyr::mutate(pr_positive_b2 = 95 + 999) |> 
  dplyr::mutate(pr_vampire_positive_b2 = 
                  pr_positive_vampire_b2 * 100 / pr_positive_b2) |> 
  dplyr::glimpse()
#> Rows: 1
#> Columns: 5
#> $ pr_vampire_b2          <dbl> 0.001
#> $ pr_positive_vampire_b2 <dbl> 0.95
#> $ pr_positive_mortal_b2  <dbl> 0.01
#> $ pr_positive_b2         <dbl> 1094
#> $ pr_vampire_positive_b2 <dbl> 0.08683729

3.1 Sampling from a grid-approximate posterior

3.1.1 ORIGINAL

Before we are going to draw samples from the posterior distribution we need to compute the distribution similar as we had done in the globe tossing example.

“Here’s a reminder for how to compute the posterior for the globe tossing model, using grid approximation. Remember, the posterior here means the probability of p conditional on the data.” (McElreath, 2020, p. 52) (pdf)

R Code 3.5 a: Generate the posterior distribution for the globe-tossing example (Original)

Code
### R code 3.2a Grid Globe ##########################
# change prob_b to prior
# change prob_data to likelihood
# added variables: n_grid_a, n_success_a, n_trials_a

n_grid_a <- 1000L # number of grid points
n_success_a <- 6L # observed water
n_trials_a <-  9L # number of trials


p_grid_a <- seq(from = 0, to = 1, length.out = n_grid_a)
prior_a <- rep(1, n_grid_a) # = prior, = uniform distribution, 1000 times 1
likelihood_a <- 
  dbinom(n_success_a, size = n_trials_a, prob = p_grid_a) # = likelihood
posterior_a <- likelihood_a * prior_a
posterior_a <- posterior_a / sum(posterior_a)

“Now we wish to draw 10,000 samples from this posterior. Imagine the posterior is a bucket full of parameter values, numbers such as \(0.1, 0.7, 0.5, 1,\) etc. Within the bucket, each value exists in proportion to its posterior probability, such that values near the peak are much more common than those in the tails. We’re going to scoop out 10,000 values from the bucket. Provided the bucket is well mixed, the resulting samples will have the same proportions as the exact posterior density. Therefore the individual values of \(p\) will appear in our samples in proportion to the posterior plausibility of each value.” (McElreath, 2020, p. 52) (pdf)

R Code 3.6 a: Draw 1000 samples from the posterior distribution (Original)

Code
n_samples_a <- 1e4

base::set.seed(3) # for reproducibility

## R code 3.3a Sample Globe ##########################################
samples_a <- sample(p_grid_a, 
                    prob = posterior_a, # from previous code chunk
                    size = n_samples_a, replace = TRUE)

The probability of each value is given by posterior_a, which we computed with R Code 3.5.

Note 3.1 : Details for a better understanding and comparison with the tidyverse version

To compare with the tidyverse version, I collected the three vectors with base::cbind() into a matrix and displayed the first six lines with utils::head(). Additionally I also displayed the first 10 values of samples_a vector.

R Code 3.7 a: Excursion for better comparison (Base R)

Code
# display grid results to compare with variant b
d_a <- cbind(p_grid_a, prior_a, likelihood_a, posterior_a) 
head(d_a, 10)
#>          p_grid_a prior_a likelihood_a  posterior_a
#>  [1,] 0.000000000       1 0.000000e+00 0.000000e+00
#>  [2,] 0.001001001       1 8.425225e-17 8.433659e-19
#>  [3,] 0.002002002       1 5.375951e-15 5.381333e-17
#>  [4,] 0.003003003       1 6.105137e-14 6.111249e-16
#>  [5,] 0.004004004       1 3.419945e-13 3.423368e-15
#>  [6,] 0.005005005       1 1.300676e-12 1.301978e-14
#>  [7,] 0.006006006       1 3.872087e-12 3.875963e-14
#>  [8,] 0.007007007       1 9.734489e-12 9.744233e-14
#>  [9,] 0.008008008       1 2.162473e-11 2.164638e-13
#> [10,] 0.009009009       1 4.370695e-11 4.375070e-13
Code
# display sample results to compare with variant b
head(samples_a, 10)
#>  [1] 0.5645646 0.6516517 0.5475475 0.5905906 0.5955956 0.7877878 0.7267267
#>  [8] 0.4914915 0.7507508 0.4494494

R Code 3.8 a: Scatterplot of the drawn samples (Base R)

Code
## R code 3.4a Globe Scatterplot #############
plot(samples_a)

Graph 3.1: Scatterplot of the drawn samples (Base R)

In fig-globe-glossing-plot-a “it’s as if you are flying over the posterior distribution, looking down on it. There are many more samples from the dense region near 0.6 and very few samples below 0.25. On the right, the plot shows the density estimate computed from these samples.” (McElreath, 2020, p. 53) (pdf)

R Code 3.9 a: Density estimate of the drawn samples (Rethinking)

Code
## R code 3.5a Globe Density plot #############
rethinking::dens(samples_a)

Graph 3.2: Density estimate of the drawn samples (Rethinking)

3.1.2 TIDYVERSE

R Code 3.10 b: Generate the posterior distribution form the globe-tossing example (Tidyverse)

Code
## R code 3.2b Grid Globe #####################

# how many grid points would you like?
n_grid_b <- 1000L
n_success_b <- 6L
n_trials_b  <- 9L

d_b <-
  tibble::tibble(p_grid_b = seq(from = 0, to = 1, length.out = n_grid_b),
         prior_b  = 1) |>  # flat uniform prior, vector 1L recycling
  dplyr::mutate(likelihood_b = 
                  stats::dbinom(n_success_b, size = n_trials_b, prob = p_grid_b)) |> 
  dplyr::mutate(posterior_b = 
                  (likelihood_b * prior_b) / sum(likelihood_b * prior_b))

head(d_b)
#> # A tibble: 6 × 4
#>   p_grid_b prior_b likelihood_b posterior_b
#>      <dbl>   <dbl>        <dbl>       <dbl>
#> 1  0             1     0           0       
#> 2  0.00100       1     8.43e-17    8.43e-19
#> 3  0.00200       1     5.38e-15    5.38e-17
#> 4  0.00300       1     6.11e-14    6.11e-16
#> 5  0.00400       1     3.42e-13    3.42e-15
#> 6  0.00501       1     1.30e-12    1.30e-14

Variable names changed

I have changed McElreath’s variable name prob_p and prob_data as prior_x and likelihood_x, where x stands for a (Base R) or b (Tidyverse).

To see the difference between grid and samples I will add “_sample” to all the other variable names.

R Code 3.11 b: Draw 1000 samples from the posterior distribution (Tidyverse)

Code
## R code 3.3b Sample Globe #####################
# how many samples would you like?
n_samples_b <- 1e4

# make it reproducible
base::set.seed(3)

df_samples_b <-
  d_b |> 
    dplyr::slice_sample(n = n_samples_b, weight_by = posterior_b, replace = T)

df_samples_b <- df_samples_b |>
    dplyr::rename(samples_b = p_grid_b,
           likelihood_samples_b = likelihood_b,
           prior_samples_b = prior_b,
           posterior_samples_b = posterior_b)
head(df_samples_b)
#> # A tibble: 6 × 4
#>   samples_b prior_samples_b likelihood_samples_b posterior_samples_b
#>       <dbl>           <dbl>                <dbl>               <dbl>
#> 1     0.565               1                0.225             0.00225
#> 2     0.652               1                0.272             0.00272
#> 3     0.548               1                0.210             0.00210
#> 4     0.591               1                0.245             0.00245
#> 5     0.596               1                0.248             0.00248
#> 6     0.788               1                0.192             0.00192

identical(samples_a, df_samples_b$samples_b) = TRUE.

The column samples_b is identical with vector samples_a, because I have used in both sampling processes base::set.seed(3), so that I (and you) could reproduce the data.

There are different possibilities to display data frames respectively tibbles:

  1. You can use the internal print facility of tibbles. It shows only the first ten rows and all columns that fit on the screen. You see an example in R Code 3.11.
  2. With the utils::str() function you will get a result with shorter figures that is better adapted to a small screen.
  3. Another alternative is the tidyverse approach of dplyr::glimpse().
  4. With skimr::skim() you will get a compact summary of all data.

R Code 3.12 b: Excursion: Printing varieties for better comparison (Tidyverse)

Code
glue::glue('USING THE str() FUNCTION:')
utils::str(df_samples_b)
glue::glue('#####################################################\n\n')
glue::glue('USING THE dplyr::glimpse() FUNCTION:')
dplyr::glimpse(df_samples_b)
glue::glue('#####################################################\n\n')
glue::glue('USING THE skimr::skim() FUNCTION:')
skimr::skim(df_samples_b)
#> USING THE str() FUNCTION:
#> tibble [10,000 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ samples_b           : num [1:10000] 0.565 0.652 0.548 0.591 0.596 ...
#>  $ prior_samples_b     : num [1:10000] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ likelihood_samples_b: num [1:10000] 0.225 0.272 0.21 0.245 0.248 ...
#>  $ posterior_samples_b : num [1:10000] 0.00225 0.00272 0.0021 0.00245 0.00248 ...
#> #####################################################
#> 
#> USING THE dplyr::glimpse() FUNCTION:
#> Rows: 10,000
#> Columns: 4
#> $ samples_b            <dbl> 0.5645646, 0.6516517, 0.5475475, 0.5905906, 0.595…
#> $ prior_samples_b      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ likelihood_samples_b <dbl> 0.22455994, 0.27190272, 0.20966655, 0.24460869, 0…
#> $ posterior_samples_b  <dbl> 0.0022478473, 0.0027217490, 0.0020987643, 0.00244…
#> #####################################################
#> 
#> USING THE skimr::skim() FUNCTION:
Data summary
Name df_samples_b
Number of rows 10000
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
samples_b 0 1 0.64 0.14 0.15 0.55 0.65 0.74 0.97 ▁▂▇▇▂
prior_samples_b 0 1 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
likelihood_samples_b 0 1 0.20 0.07 0.00 0.16 0.23 0.26 0.27 ▁▁▂▃▇
posterior_samples_b 0 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ▁▁▂▃▇

We can plot the left panel of Figure 3.1 from the book with ggplot2::geom_point(). But before we do, we’ll need to add a variable numbering the samples. This is necessary as the x-parameter of the plot.

R Code 3.13 b: Scatterplot of the drawn samples (Tidyverse)

Code
## R code 3.4b Globe Scatterplot ########################
df_samples_b |> 
  dplyr::mutate(sample_number = 1:dplyr::n()) |> 
  
  ggplot2::ggplot(ggplot2::aes(x = sample_number, y = samples_b)) +
  ggplot2::geom_point(alpha = 1/10) +
  ggplot2::scale_y_continuous("proportion of water (p)", limits = c(0, 1)) +
  ggplot2::xlab("sample number") +
  ggplot2::theme_bw()

Graph 3.3: Scatterplot of the drawn samples (Tidyverse)

If you hover over this link from Graph 3.1 you can compare the Base R version with the tidyverse result.

Instead of the rethinking::dens() we’ll plot the density in the right panel of the books Figure 3.1 with ggplot2::geom_density().

R Code 3.14 b: Density estimate of the drawn samples with 1e4 grid points (Tidyverse)

Code
## R code 3.5b(1) Globe Density ###########################
df_samples_b |> 
  ggplot2::ggplot(ggplot2::aes(x = samples_b)) +
  ggplot2::geom_density(fill = "grey") +
  ggplot2::scale_x_continuous("proportion of water (p)", limits = c(0, 1)) +
  ggplot2::theme_bw()

Graph 3.4: Density estimate of the drawn samples with 1e4 grid points (Tidyverse)

Compare this somewhat smoother tidyverse plot with Graph 3.2.

“You can see that the estimated density is very similar to ideal posterior you computed via grid approximation. If you draw even more samples, maybe 1e5 or 1e6, the density estimate will get more and more similar to the ideal.” (McElreath, 2020, p. 53) (pdf)

Here’s what it looks like with 1e6.

R Code 3.15 b: Density estimate of the drawn samples with 1e6 grid points (Tidyverse)

Code
base::set.seed(3)

## R code 3.5b(2) Globe Density ###########################
d_b |> 
  dplyr::slice_sample(n = 1e6, weight_by = posterior_b, replace = T) |>
  
  ggplot2::ggplot(ggplot2::aes(x = p_grid_b)) +
  ggplot2::geom_density(fill = "grey") +
  ggplot2::scale_x_continuous("proportion of water (p)", limits = c(0, 1)) +
  ggplot2::theme_bw()

Graph 3.5: Density estimate of the drawn samples with 1e6 grid points (Tidyverse)

3.2 Sampling to Summarize

All we have done so far is crudely replicate the posterior density we had already computed in the previous chapter. Now it is time to use these samples to describe and understand the posterior.

The description to understand the posterior can be divided into three inquiries:

  1. Questions about intervals of defined boundaries. See Section 3.2.1.
  2. Questions about intervals of defined probability mass. See Section 3.2.2.
  3. Questions about point estimates. See: Section 3.2.3.

3.2.1 Intervals of Defined Boundaries

3.2.1.1 ORIGINAL

3.2.1.1.1 Grid-approximate Posterior

For instance: What is the probability that the proportion of water is less than 0.5?

“Using the grid-approximate posterior, you can just add up all of the probabilities, where the corresponding parameter value is less than 0.5:” (McElreath, 2020, p. 53) (pdf)

R Code 3.16 a: Define boundaries using the grid-approximate posterior (Base R)

Code
## R code 3.6a Grid Posterior Boundary ##############################
# add up posterior probability where p < 0.5
sum(posterior_a[p_grid_a < 0.5])
#> [1] 0.1718746

About 17% of the posterior probability is below 0.5.

3.2.1.1.2 Samples from the Posterior

But this easy calculation based on grid approximation is often not practical when there are more parameters. In this case you can draw samples from the posterior. But this approach requires a different calculation:

“This approach does generalize to complex models with many parameters, and so you can use it everywhere. All you have to do is similarly add up all of the samples below 0.5, but also // divide the resulting count by the total number of samples. In other words, find the frequency of parameter values below 0.5” (McElreath, 2020, p. 53/54)” (McElreath, 2020, p. 53) (pdf)

R Code 3.17 a: Compute posterior probability below 0.5 using the sampling approach (Base R)

Code
## R code 3.7a Sample Boundary #############################
(p_boundary_a <- sum(samples_a < 0.5) / 1e4)
#> [1] 0.1629

Different values with samples from the posterior

In comparison with the value of the posterior probability below 0.5 in the book of 0.1726 the result in 0.1629 from R Code 3.17 is quite different.

The reason for the difference is that you can’t get the same values in a sampling processes. This is the nature of randomness. And McElreath did not include the base::set.seed() function for (exact) reproducibility.

Using the same approach, you can ask how much posterior probability lies between 0.5 and 0.75:

R Code 3.18 a: Compute posterior probability between 0.5 and 0.75 using the sampling approach (Base R)

Code
## R code 3.8a Sample Interval #############################
(p_boundary_a8 <- sum(samples_a > 0.5 & samples_a < 0.75) / 1e4)
#> [1] 0.6061

3.2.1.2 TIDYVERSE

3.2.1.2.1 Grid-approximate Posterior

To get the proportion of water less than some value of p_grid_b within the {tidyverse}, you might first filter() by that value and then take the sum() within summarise(). (Kurz)

R Code 3.19 b: Compute posterior probability below 0.5 using the grid approach (Tidyverse)

Code
## R code 3.6b Grid Boundary ######################
# add up posterior probability where p < 0.5
d_b |> dplyr::filter(p_grid_b < 0.5) |> 
    dplyr::summarize(sum = base::sum(posterior_b))
#> # A tibble: 1 × 1
#>     sum
#>   <dbl>
#> 1 0.172
3.2.1.2.2 Samples from the Posterior

Kurz offers several methods to calculate the posterior probability below 0.5:

  1. Method: filter() & n() within summarize()
  2. Method: count() followed by mutate()
  3. Method: Logical condition within mean()

R Code 3.20 b: Compute posterior probability below 0.5 using the sampling approach with different methods (Tidyverse)

Code
# add up all posterior probabilities of samples under .5
## R code 3.7b Sample Boundary #########################

###### Method (1) #######
method_1 <- 
  df_samples_b |> 
      dplyr::filter(samples_b < .5) |> 
      dplyr::summarize(sum = dplyr::n() / n_samples_b)

glue::glue('Method 1:\n')
method_1
glue::glue('##################################################\n\n')

###### Method (2) #######
method_2 <- 
  df_samples_b |> 
    dplyr::count(samples_b < .5) |> 
    dplyr::mutate(probability = n_samples_b / base::sum(n_samples_b))

glue::glue('Method 2:\n')
method_2
glue::glue('##################################################\n\n')

###### Method (3) #######
method_3 <- 
  df_samples_b |> 
      dplyr::summarize(sum = mean(samples_b < .5))

glue::glue('Method 3:\n')
method_3
#> Method 1:
#> # A tibble: 1 × 1
#>     sum
#>   <dbl>
#> 1 0.163
#> ##################################################
#> 
#> Method 2:
#> # A tibble: 2 × 3
#>   `samples_b < 0.5`     n probability
#>   <lgl>             <int>       <dbl>
#> 1 FALSE              8371           1
#> 2 TRUE               1629           1
#> ##################################################
#> 
#> Method 3:
#> # A tibble: 1 × 1
#>     sum
#>   <dbl>
#> 1 0.163

To determine the posterior probability between 0.5 and 0.75, you can use & within filter(). Just multiply that result by 100 to get the value in percent.

R Code 3.21 b: Compute posterior probability between 0.5 and 0.75 using the sampling approach (Tidyverse)

Code
## R code 3.8b Sample Interval ##############
df_samples_b |> 
    dplyr::filter(samples_b > .5 & samples_b < .75) |> 
    dplyr::summarize(sum = dplyr::n() / n_samples_b)
#> # A tibble: 1 × 1
#>     sum
#>   <dbl>
#> 1 0.606

And, of course, you can do this calculation with the other methods as well.

To produce the top part of Figure 3.2 of the book we apply following code lines:

R Code 3.22 b: Posterior distribution produced with {tidyverse} approach

Code
## R Code Fig 3.2 Upper Part #############
# upper left panel
p1 <-
  df_samples_b  |>  
  ggplot2::ggplot(ggplot2::aes(x = samples_b, y = posterior_samples_b)) +
  ggplot2::geom_line() +
  ggplot2::geom_area(data = df_samples_b  |>  
         dplyr::filter(samples_b < .5), fill = "deepskyblue") +
  ggplot2::labs(x = "proportion of water (p)", y = "density")  +
  ggplot2::theme_bw()

# upper right panel
p2 <- 
  df_samples_b  |>  
  ggplot2::ggplot(ggplot2::aes(x = samples_b, y = posterior_samples_b)) +
  ggplot2::geom_line() +
  ggplot2::geom_area(data = df_samples_b  |>  
         dplyr::filter(samples_b > .5 & samples_b < .75), fill = "deepskyblue") +
  ggplot2::labs(x = "proportion of water (p)", y = "density") +
  ggplot2::theme_bw()

library(patchwork)
p1 + p2

Graph 3.6: Upper part of SR2 Figure 3.2: Intervals of defined boundaries produced with {tidyverse} tools: Left: The blue area is the posterior probability below a parameter value of 0.5. Right: The posterior probability between 0.5 and 0.75.

3.2.2 Intervals of Defined Probability Mass

3.2.2.1 ORIGINAL

3.2.2.1.1 Quantiles

Definition 3.2 : Several Terms for Intervals of Defined Probability Mass

The abbreviation for all three versions of probability mass intervals is “CI”.

McElreath tries to avoid the semantic of “confidence” or “credibility” because

“What the interval indicates is a range of parameter values compatible with the model and data. The model and data themselves may not inspire confidence, in which case the interval will not either.” (McElreath, 2020, p. 54) (pdf)

The values of these intervals can be found easier by using samples from the posterior than by using a grid approximation.

“Suppose for example you want to know the boundaries of the lower \(80\%\) posterior probability. You know this interval starts at \(p = 0\). To find out where it stops, think of the samples as data and ask where the \(80th\) percentile lies:” (McElreath, 2020, p. 54) (pdf)

R Code 3.23 a: Compute posterior probability intervals lower \(80\%\) and from 10th to 90th percentile

Code
## R code 3.9a Quantile 0.8 #######################
q8 <- quantile(samples_a, 0.8)

## R code 3.10a PI = Quantile 0.1-0.9 ###################
q1_9 <- quantile(samples_a, c(0.1, 0.9))
  • Lower \(80\%\) = 0.7627628
  • Between \(10\%\) = 0.4514515 and \(90\%\) = 0.8148148
3.2.2.1.2 Percentile Interval (PI)

The second calculation returns the middle 80% of the distribution

“Intervals of this sort, which assign equal probability mass to each tail, are very common in the scientific literature. We’ll call them percentile intervals (PI). These intervals do a good job of communicating the shape of a distribution, as long as the distribution isn’t too asymmetrical. But in terms of supporting inferences about which parameters are consistent with the data, they are not perfect.” (McElreath, 2020, p. 55) (pdf)

But they do not work well with highly skewed data. See Graph 3.7 where the posterior is consistent with observing three waters in three tosses and a uniform (flat) prior.

R Code 3.24 a: Skewed posterior distribution observing three waters in three tosses and a uniform (flat) prior (Base R)

Code
## R code 3.11a Skewed data #####################
p_grid_skewed_a <- seq(from = 0, to = 1, length.out = 1000)
prior_skewed_a <- rep(1, 1000)
likelihood_skewed_a <- dbinom(3, size = 3, prob = p_grid_skewed_a)
posterior_skewed_a <- likelihood_skewed_a * prior_skewed_a
posterior_skewed_a <- posterior_skewed_a / sum(posterior_skewed_a)

base::set.seed(3) # added to make sampling distribution reproducible (pb)
samples_skewed_a <- sample(p_grid_skewed_a, 
             size = 1e4, replace = TRUE, prob = posterior_skewed_a)

# added to show the skewed posterior distribution (pb)
rethinking::dens(samples_skewed_a)

Graph 3.7: Skewed posterior distribution observing three waters in three tosses and a uniform (flat) prior. It is highly skewed, having its maximum value at the boundary where p equals 1.

“[The percentile interval] assigns 25% of the probability mass above and below the interval. So it provides the central 50% probability. But in this example, it ends up excluding the most probable parameter values, near p = 1. So in terms of describing the shape of the posterior distribution—which is really all these intervals are asked to do—the percentile interval can be misleading.” (McElreath, 2020, p. 56) (pdf)

rethinking::PI() is just a shorthand for the base R stats::quantile() function.

R Code 3.25 a: Computing the Percentile Interval (PI)

Code
## R code 3.12a(1) PI ############################
rethinking::PI(samples_skewed_a, prob = 0.5)
#>       25%       75% 
#> 0.7087087 0.9349349

Procedure 3.1 : Probability mass (PI) prob = 0.6 equals the interval between \(20-80\%\)

rethinking::PI() is just a shorthand for the base R stats::quantile() function. Instead of providing the interval explicitly (for instance \(20-80\%\)) we just say that we want the central \(60\%\) = PI of prob = 0.6.

rethinking::PI() simplifies the following three steps:

  1. We divide always the percentage assigned to the prob value by 2: 60% / 2 = 30%$
  2. We subtract, respectively add this value to 50: 50% - 30% = 20% and 50% + 30% = 80%
  3. The result is the probability mass between 20-80%.

R Code 3.26 a: Compute PI between \(20-80\%\) (Rethinking and Base R)

Code
## R code 3.12a(2) PI ############################

glue::glue('Central Propbability Mass calculated with rethinking::PI()\n')
rethinking::PI(samples_skewed_a, prob = 0.6)
glue::glue('########################################################\n\n')
glue::glue('Central Propbability Mass calculated with stats::quantile()\n')
quantile(samples_skewed_a, prob = c(.20, .80))
#> Central Propbability Mass calculated with rethinking::PI()
#>       20%       80% 
#> 0.6706707 0.9481481 
#> ########################################################
#> 
#> Central Propbability Mass calculated with stats::quantile()
#>       20%       80% 
#> 0.6706707 0.9481481
3.2.2.1.3 Highest Posterior Density Interval (HPDI)

To include the most probable parameter value, the modus or MAP we should calculate the HPDI:

“The HPDI is the narrowest interval containing the specified probability mass. If you think about it, there must be an infinite number of posterior intervals // with the same mass. But if you want an interval that best represents the parameter values most consistent with the data, then you want the densest of these intervals. That’s what the HPDI is.” (McElreath, 2020, p. 56/57)

R Code 3.27 a: Compute Highest Posterior Density Interval (HPDI) (Rethinking / Base R)

Code
## R code 3.13a HPDI ###############################
rethinking::HPDI(samples_skewed_a, prob = 0.5)
#>      |0.5      0.5| 
#> 0.8418418 0.9989990

Note 3.2 : HPDI versus PI

Advantages of HPDI

  1. HPDI captures the parameters with highest posterior probability
  2. HPDI is noticeably narrower than PI. In our example 0.157 versus rround(rethinking::PI(samples_skewed_a, prob = 0.5)[[2]] - rethinking::PI(samples_skewed_a, prob = 0.5)[[1]], 3)`.

But this is only valid if we have a very skewed distribution.

Disadvantages of HPDI

  1. HPDI is more computationally intensive than PI
  2. HPDI is sensitive of the number of samples drawn (= greater simulation variance)
  3. HPDI is more difficult to understand

Remember, the entire posterior distribution is the Bayesian estimate

“Overall, if the choice of interval type makes a big difference, then you shouldn’t be using intervals to summarize the posterior. Remember, the entire posterior distribution is the Bayesian ‘estimate.’ It summarizes the relative plausibilities of each possible value of the parameter. Intervals of the distribution are just helpful for summarizing it. If choice of interval leads to different inferences, then you’d be better off just plotting the entire posterior distribution.” (McElreath, 2020, p. 58) (pdf)

“[I]n most cases, these two types of interval are very similar.58 They only look so different in this case because the posterior distribution is highly skewed.” (McElreath, 2020, p. 57) (pdf)n

R Code 3.28 a: Compare PI and HPDI of symmetric distributions: Six ‘Water’, 9 tosses

Code
pi8a <- rethinking::PI(samples_a, prob = 0.8)
hpdi8a <- rethinking::HPDI(samples_a, prob = 0.8)

pi95a <- rethinking::PI(samples_a, prob = 0.95)
hpdi95a <- rethinking::HPDI(samples_a, prob = 0.95)
  • PI 80% = 0.4514515, 0.8148148 versus HPDI 80% = 0.4874875, 0.8448448.
  • PI 95% = 0.3493493, 0.8788789 versus HPDI 95% = 0.3703704, 0.8938939.

The 95% interval in frequentist statistics is just a convention, there are no analytical reasons why you should choose exactly this interval. But convenience is not a serious criterion. So what to do instead?

Instead of 95% intervals use the widest interval that excludes the value you want to report or provide a series of nested intervals

“If you are trying to say that an interval doesn’t include some value, then you might use the widest interval that excludes the value. Often, all compatibility intervals do is communicate the shape of a distribution. In that case, a series of nested intervals may be more useful than any one interval. For example, why not present 67%, 89%, and 97% intervals, along with the median? Why these values? No reason. They are prime numbers, which makes them easy to remember. But all that matters is they be spaced enough to illustrate the shape of the posterior. And these values avoid 95%, since conventional 95% intervals encourage many readers to conduct unconscious hypothesis tests.” (McElreath, 2020, p. 56) (pdf)

Boundaries versus Probability Mass Intervals

Boundaries:

  • We ask for a probability of frequencies.
  • Result is a percentage of probability.
  • Boundaries are grounded on the probability (prob) values (\(0-1\)).

Probability Mass:

  • We ask for a specified amount of posterior probability.
  • Result is the probability value of the percentage of frequencies we looked for.
  • Probability Mass is grounded on the percentage of probabilities (\(0-100\%\)).

3.2.2.2 TIDYVERSE

3.2.2.2.1 Quantiles

Kurz offers again different methods — this time to calculate probability mass:

  1. Method: Since we saved our samples_b samples within the df_samples_b tibble, we’ll have to index with $ within stats::quantile().
  2. Method: For an alternative approach, we could dplyr::select() the samples_b vector, extract it from the tibble with dplyr::pull(), and then pump it into stats::quantile().

pull() is similar to $. It’s mostly useful because it looks a little nicer in pipes, it also works with remote data frames, and it can optionally name the output. (dplyr::pull() help file “Extract a single column”)

  1. Method: We might also use stats::quantile() within dplyr::summarize().

R Code 3.29 b: Compute posterior probability mass intervals of \(80\%\) with different methods

Code
## R code 3.9b Quantile 0.8 diff. approaches   #######################
###### Method (1) #######
method_1 <- q80 <- 
  quantile(df_samples_b$samples_b, probs = .8)

glue::glue('Method 1:\n')
method_1
glue::glue('##################################################\n\n')

###### Method (2) #######
method_2 <- 
  df_samples_b |> 
    dplyr::pull(samples_b) |> 
    quantile(probs = .8)

glue::glue('Method 2:\n')
method_2
glue::glue('##################################################\n\n')

###### Method (3) #######
method_3 <- 
  df_samples_b |> 
    dplyr::summarize(q80_2 = quantile(samples_b, probs = .8))

glue::glue('Method 3:\n')
method_3
#> Method 1:
#>       80% 
#> 0.7627628 
#> ##################################################
#> 
#> Method 2:
#>       80% 
#> 0.7627628 
#> ##################################################
#> 
#> Method 3:
#> # A tibble: 1 × 1
#>   q80_2
#>   <dbl>
#> 1 0.763
3.2.2.2.2 Percentile Interval (PI)

Here’s the summarize() approach with two probabilities returning the percentile interval (PI).

R Code 3.30 b: Compute probability mass of 80% with summarize method

Code
## R code 3.10b(1) PI = Quantile 0.1-0.9 ###################
df_samples_b |> 
    dplyr::summarize(q10 = quantile(samples_b, probs = .1),
              q90 = quantile(samples_b, probs = .9))
#> # A tibble: 1 × 2
#>     q10   q90
#>   <dbl> <dbl>
#> 1 0.451 0.815

Kurz refers also to a now deprecated calculation using the vector feature of R to summarize different quantiles with one line.


samples_b |> dplyr::summarize(q10_90 = quantile(samples_b, probs = c(.1, .9)))

This produces nowadays the following warning message:

Watch out!

Warning: Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0.

Please use reframe() instead.

When switching from summarise() to reframe(), remember that reframe() always returns an ungrouped data frame and adjust accordingly.

Therefore I am going to change dplyr::summarize() to dplyr::reframe().

R Code 3.31 b: Using dplyr::reframe() to compute probability mass intervals of 80%

Code
## R code 3.10b(2) PI = Quantile 0.1-0.9 ###################
df_samples_b |> 
    dplyr::reframe(q10_90 = quantile(samples_b, probs = c(.1, .9)))
#> # A tibble: 2 × 1
#>   q10_90
#>    <dbl>
#> 1  0.451
#> 2  0.815

Note 3.3 : Excursion: Using dplyr::reframe() and friends

From the help file of dplyr::reframe():

While summarise() requires that each argument returns a single value, and mutate() requires that each argument returns the same number of rows as the input, reframe() is a more general workhorse with no requirements on the number of rows returned per group.

reframe() creates a new data frame by applying functions to columns of an existing data frame. It is most similar to summarise(), with two big differences:

  • reframe() can return an arbitrary number of rows per group, while summarise() reduces each group down to a single row.
  • reframe() always returns an ungrouped data frame, while summarise() might return a grouped or rowwise data frame, depending on the scenario.

We expect that you’ll use summarise() much more often than reframe(), but reframe() can be particularly helpful when you need to apply a complex function that doesn’t return a single summary value.

See also the appropriate section in the blog post about changes in {dplyr} 1.1.0. The name reframe() is in accordance with tibble::enframe() and tibble::deframe():

  • enframe(): Takes a vector, returns a data frame
  • deframe(): Takes a data frame, returns a vector
  • reframe(): Takes a data frame, returns a data frame

The functions of the tidyverse approach typically returns a data frame. But sometimes you just want your values in a numeric vector for the sake of quick indexing. In that case, base R stats::quantile() shines: (Kurz in the section: Intervals of defined mass)

R Code 3.32 b: Using stats::quantile() to get a vector of a probability mass calculation

Code
## R code 3.12b(1) PI = Quantile 0.1-0.9#############################
(q10_q90 = quantile(df_samples_b$samples_b, probs = c(.1, .9)))
#>       10%       90% 
#> 0.4514515 0.8148148

This is the same method as in R base with the difference that we are working with tibbles and need therefore to use the $ operator.

To produce the bottom part of Figure 3.2 of the book we apply following code lines.

R Code 3.33 b: Plots of defined mass intervals lower of 80% and the middle 80%

Code
## R Code Fig 3.2b Lower Part #############
p1 <-
  df_samples_b |> 
  ggplot2::ggplot(ggplot2::aes(x = samples_b, y = posterior_samples_b)) +
  ggplot2::geom_line() +
  ggplot2::geom_area(data = df_samples_b |> 
                dplyr::filter(samples_b < q80), fill = "deepskyblue") +
  ggplot2::annotate(geom = "text",
           x = .25, y = .0025,
           label = "lower 80%") +
  ggplot2::labs(x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme_bw()

# upper right panel
p2 <- 
  df_samples_b |> 
  ggplot2::ggplot(ggplot2::aes(x = samples_b, y = posterior_samples_b)) +
  ggplot2::geom_line() +
  ggplot2::geom_area(data = df_samples_b |> 
            dplyr::filter(samples_b > q10_q90[[1]] & 
                    samples_b < q10_q90[[2]]), fill = "deepskyblue") +
  ggplot2::annotate(geom = "text",
           x = .25, y = .0025,
           label = "middle 80%") +
  ggplot2::labs(x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme_bw()

library(patchwork)
p1 + p2

Graph 3.8: Lower part of SR2 Figure 3.2: Intervals of defined mass produced with {tidyverse} tools: Left: Lower 80% posterior probability exists below a parameter value of about 0.75. Right: Middle 80% posterior probability lies between the 10% and 90% quantiles.

Again we will demonstrate the misleading character of Percentile Intervals (PIs) with a very skewed distribution.

We’ve already defined p_grid_b and prior_b within d_b, above. Here we’ll reuse them and create a new tibble by updating all the columns with the skewed parameters of three ‘Water’ in three tosses.

To see the difference how the skewed distribution is different to the Figure 3.2 lower part, I will draw the appropriate figure here myself.

R Code 3.34 b: Skewed posterior distribution observing three waters in three tosses and a uniform (flat) prior (Tidyverse)

Code
## R code 3.11b Skewed data #########################
# here we update the `dbinom()` parameters 
# for values for a skewed distribution
# assuming three trials results in 3 W (Water)

n_samples_skewed_b <- 1e4
n_success_skewed_b <- 3
n_trials_skewed_b  <- 3

# update `d_b` to d_skewed_b
d_skewed_b <-
  d_b |> 
    dplyr::mutate(likelihood_skewed_b = 
        stats::dbinom(n_success_skewed_b,
           size = n_trials_skewed_b, prob = p_grid_b)) |> 
    dplyr::mutate(posterior_skewed_b  = 
          (likelihood_skewed_b * prior_b) / 
          sum(likelihood_skewed_b * prior_b))

# make the next part reproducible
base::set.seed(3)

# here's our new samples tibble
samples_skewed_b <- 
    d_skewed_b |> 
    dplyr::slice_sample(n = n_samples_skewed_b, 
        weight_by = posterior_skewed_b, replace = T) |> 
    dplyr::rename(p_skewed_b = p_grid_b,
           prior_skewed_b = prior_b)


# added to see the skewed distribution
samples_skewed_b |>  
  ggplot2::ggplot( ggplot2::aes(x = p_skewed_b, y = posterior_skewed_b)) +
  ggplot2::geom_line() +
  ggplot2::theme_bw()

Graph 3.9: Skewed posterior distribution observing three waters in three tosses and a uniform (flat) prior. The distribution is highly skewed, having its maximum value where p equals 1.

To see how the skewed distribution is different to the book’s Figure 3.2 lower part, I will draw the appropriate figure here.

R Code 3.35 b: Lower 80% and middle 80% of probability mass intervals in the skewed distribution

Code
## R code 3.12b(2) PI < 80 & middle 80% #############################
p1 <-
  samples_skewed_b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_skewed_b, y = posterior_skewed_b)) +
  ggplot2::geom_line() +
  ggplot2::geom_area(data = samples_skewed_b |> 
      dplyr::filter(p_skewed_b < q80), fill = "deepskyblue") +
  ggplot2::annotate(geom = "text",
           x = .25, y = .0025,
           label = "lower 80%") +
  ggplot2::labs(x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme_bw()

# upper right panel
p2 <- 
  samples_skewed_b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_skewed_b, y = posterior_skewed_b)) +
  ggplot2::geom_line() +
  ggplot2::geom_area(data = samples_skewed_b |> 
      dplyr::filter(p_skewed_b > q10_q90[[1]] & 
                    p_skewed_b < q10_q90[[2]]), fill = "deepskyblue") +
  ggplot2::annotate(geom = "text",
           x = .25, y = .0025,
           label = "middle 80%") +
  ggplot2::labs(x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme_bw()

library(patchwork)
p1 + p2

Graph 3.10: Probability mass intervals in a skewed distribution. Left: Lower 80%. right: Middle 80% = 80% Percentile Interval (PI)
Introducing tidybayes

Introducing the {tidybayes} package by Matthew Kay

The {tidybayes} package by Matthew Kay offers an array of convenience functions for summarizing Bayesian models.

{tidybayes} is an R package that aims to make it easy to integrate popular Bayesian modeling methods into a tidy data + ggplot workflow. It builds on top of (and re-exports) several functions for visualizing uncertainty from its sister package, {ggdist}.

For the following parts the section on Point Summaries and Intervals and the reference on Point and interval summaries for tidy data frames of draws from distributions are especially important.

The mentioned vignettes above are long articles I haven’t read yet. I plan to do this in the next future but for now I am concentrating and trusting on Kurz’ text to apply and explain the most important function parallel to McElreath book chapters.

Many function names used by Kurz do not apply anymore

I had difficulties to use Kurz’s functions because there was an overhaul in the naming scheme of {tidybayes} version 1.0 and a deprecation of horizontal shortcut geoms and stats in {tidybayes} 2.1. Because {tidybayes} integrates function of the sister package {ggdist} the function descriptions and references of {ggdist} are also important to consult. For instance all the function on point and interval summaries are now documented in {ggdist}.

There is a systematic change of the function names: The h (for ‘horizontal’) in parameter name of the point estimate was removed. For instance: Instead of median_qih(), median_hdih() and median_hdcih() it is now tidybayes::median_qi(), tidybayes::median_hdi() and tidybayes::median_hdci(). Similar with point_inverhalh(), mean\_\* and mode\_\*.


There is another adaption compared to the Kurz’ version: I can’t reproduce the codes with his samples (my samples_b) data frame, because the data has changed values recently to the skewed sampling version. To get the same results as Kurz I have to use in my naming scheme the skewed version.

The {tidybayes} package contains a family of functions that make it easy to summarize a distribution with a measure of central tendency accompanied by intervals. With tidybayes::median_qi(), we will ask for the median and quantile-based intervals — just like we’ve been doing with stats::quantile(). (Kurz)

R Code 3.36 : Computing the median quantile interval with {tidybayes}

Code
tidybayes::median_qi(samples_skewed_b$p_skewed_b, .width = .5)
#>           y      ymin      ymax .width .point .interval
#> 1 0.8428428 0.7087087 0.9349349    0.5 median        qi

Note how the .width argument within tidybayes::median_qi() worked the same way the prob argument did within rethinking::PI(). With .width = .5, we indicated we wanted a quantile-based 50% interval, which was returned in the ymin and ymax columns.

Note 3.4 : Point and interval summaries for tidy data frames of draws from distributions

The qi in tidybayes::median_qi() stands for “quantile interval”. The explanation of this function family (median_qi(), mean_qi() and mode_qi()) is now documented in the {ggdist} package.

The qi-variants is the short form of the function family of tidybayes::point_interval(..., .point = median, .interval = qi).

There are several point intervals that {tidybayes} respectively {ggdist} computes:

  • qi yields the quantile interval (also known as the percentile interval or equi-tailed interval) as a 1x2 matrix.
  • hdi yields the highest-density interval(s) (also known as the highest posterior density interval). Note: If the distribution is multimodal, hdi may return multiple intervals for each probability level (these will be spread over rows). You may wish to use hdci (below) instead if you want a single highest-density interval, with the caveat that when the distribution is multimodal hdci is not a highest-density interval.
  • hdci yields the highest-density continuous interval, also known as the shortest probability interval. Note: If the distribution is multimodal, this may not actually be the highest-density interval (there may be a higher-density discontinuous interval, which can be found using hdi).
  • ll and ul yield lower limits and upper limits, respectively (where the opposite limit is set to either Inf or -Inf). (ggdist reference)

The {tidybayes} framework makes it easy to request multiple types of intervals. In the following code chunk we’ll request 50%, 80%, and 99% intervals.

R Code 3.37 b: Requesting multiple type of intervals with {tidybayes}

Code
## R code 3.12b(3) different PIs ##############
tidybayes::median_qi(samples_skewed_b$p_skewed_b, .width = c(.5, .8, .99))
#>           y      ymin      ymax .width .point .interval
#> 1 0.8428428 0.7087087 0.9349349   0.50 median        qi
#> 2 0.8428428 0.5705706 0.9749750   0.80 median        qi
#> 3 0.8428428 0.2562563 0.9989990   0.99 median        qi

The .width column in the output indexed which line presented which interval. The value in the y column remained constant across rows. That’s because that column listed the measure of central tendency, the median in this case.

3.2.2.2.3 Highest Posterior Density Interval (HPDI)

Now let’s use the rethinking::HPDI() function to return 50% highest posterior density intervals (HPDIs).

R Code 3.38 b: Compute Highest Posterior Density Interval (HPDI) (Rethinking / Tidyverse)

Code
## R code 3.13b(1) HPDI ###############################
rethinking::HPDI(samples_skewed_b$p_skewed_b, prob = .5)
#>      |0.5      0.5| 
#> 0.8418418 0.9989990

The reason I introduce {tidybayes} now is that the functions of the {brms} package only support percentile-based intervals of the type we computed with quantile() and median_qi(). But {tidybayes} also supports HPDIs.

: Two Version for Highest Density Intervals

As already mentioned in Note 3.4 there is hdiand hcdi. Both functions produce the same result in unimodal distributions.

But in the case of an extreme skewed distribution like our data frame with the observation of 3 ‘Water’ with 3 tosses the tidybayes::hdi() functions generates an error.

Error in quantile.default(dist_y, probs = 1 - .width) :
missing values and NaN’s not allowed if ‘na.rm’ is FALSE

My error message with hdi is in contrast to Kurz’ version where this function seems to work. BTW: I got the same error message providing the vector samples_skewed_a. from the Base R version.

R Code 3.39 : High Density Intervals with {tidybayes}

Code
## R code 3.13b(2a) HPDI 6 Water, 9 tosses #####################
tidybayes::mode_hdi(df_samples_b$samples_b, .width = .5)
tidybayes::mode_hdci(df_samples_b$samples_b, .width = .5)
#>           y      ymin      ymax .width .point .interval
#> 1 0.6508383 0.5685686 0.7597598    0.5   mode       hdi
#>           y      ymin      ymax .width .point .interval
#> 1 0.6508383 0.5685686 0.7597598    0.5   mode      hdci
Code
## R code 3.13b(2b) HPDI 3 Water, 3 tosses #####################
# tidybayes::mode_hdi(samples_skewed_b$p_skewed_b, .width = .5) # generates error
tidybayes::mode_hdci(samples_skewed_b$p_skewed_b, .width = .5)
#>           y      ymin     ymax .width .point .interval
#> 1 0.9995616 0.8418418 0.998999    0.5   mode      hdci

In contrast to R Code 3.37 and R Code 3.38 we used this time the mode as the measure of central tendency. With this family of {tidybayes} functions, you specify the measure of central tendency in the prefix (i.e., mean, median, or mode) and then the type of interval you’d like (i.e., qi() or hdci()).

If all you want are the intervals without the measure of central tendency or all that other technical information, {tidybayes} also offers the handy qi() and hdi() functions.

R Code 3.40 : Numbered R Code Title

Code
## R code 3.12b4 PI 0.5 ##############################
tidybayes::qi(samples_skewed_b$p_skewed_b, .width = .5)
#>           [,1]      [,2]
#> [1,] 0.7087087 0.9349349

We have now all necessary skills to plot book’s Figure 3.3:

R Code 3.41 b: Plot the difference between percentile interval (PI) and highest posterior density intervals (HPDI)

Code
## R code Figure 3.3 #######################
# left panel
p1 <-
  samples_skewed_b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_skewed_b, y = posterior_skewed_b)) +
  # check out our sweet `qi()` indexing
  ggplot2::geom_area(data = samples_skewed_b |> 
              dplyr::filter(p_skewed_b >
                tidybayes::qi(samples_skewed_b$p_skewed_b, .width = .5)[1] & 
                     p_skewed_b <
                tidybayes::qi(samples_skewed_b$p_skewed_b, .width = .5)[2]),
                fill = "deepskyblue") +
  ggplot2::geom_line() +
  ggplot2::labs(subtitle = "50% Percentile Interval",
       x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme_bw()

# right panel
p2 <-
  samples_skewed_b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_skewed_b, y = posterior_skewed_b)) +
  ggplot2::geom_area(data = samples_skewed_b |> 
              dplyr::filter(p_skewed_b > 
                 tidybayes::hdci(samples_skewed_b$p_skewed_b, .width = .5)[1] & 
                       p_skewed_b < 
                 tidybayes::hdci(samples_skewed_b$p_skewed_b, .width = .5)[2]),
                 fill = "deepskyblue") +
  ggplot2::geom_line() +
  ggplot2::labs(subtitle = "50% HPDI",
       x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme_bw()

# combine!
library(patchwork)
p1 | p2

Graph 3.11: Reproduction of Figure 3.3 (p.57): The difference between percentile and highest posterior density compatibility intervals. The posterior density here corresponds to a flat prior and observing three water samples in three total tosses of the globe. Left: 50% percentile interval. This interval assigns equal mass (25%) to both the left and right tail. As a result, it omits the most probable parameter value, where p equals 1. Right: 50% highest posterior density interval, HPDI. This interval finds the narrowest region with 50% of the posterior probability. Such a region always includes the most probable parameter value.

Comparing the two panels of the plot you can see that in contrast to the 50% HPDI the 50% of PI does not include the highest probability value.

{magrittr} and native R pipe

Kurz uses the {magrittr} paipe whereas I am using the native R pipe. These two pipes are not in every aspect equivalent. One difference is the dot (.) syntax, “since the dot syntax is a feature of {magrittr} and not of base R.” (Understanding the native R pipe).

The native pipe is available starting with R 4.1.0. It is constructed with | followed by > resulting in the symbol |> to differentiate it from the {magrittr} pipe (%>%). To understand the details of the differences of |> and the native R pipe |> read this elaborated blog article by Isabella Velásquez, an employee of Posit (formerly RStudio).

So the following trick does not work with the native R pipe:

In the geom_area() line for the HPDI plot, did you notice how we replaced data = samples_skewed_b with data = .? When using the pipe (i.e., %>%), you can use the . as a placeholder for the original data object. It’s an odd and handy trick to know about.

Therefore I had to replace the dot with the name of the data frame.

Learn more of the pipe function %>% of the {magrittr} package and about the base R native forward pipe operator.

PI and HPDI are only different if you have a very skewed distribution. This means that in unimodal somewhat normal distribution hdi and hdci are exactly the same and pretty similar to qi calculation. In skewed distribution they differ. assertion:

b: High Density Intervals with {tidybayes}

Code
## R code 3.13b(2a) HPDI 6 Water, 9 tosses #####################
tidybayes::mode_hdi(df_samples_b$samples_b, .width = .5)
tidybayes::mode_hdci(df_samples_b$samples_b, .width = .5)
tidybayes::mode_qi(df_samples_b$samples_b, .width = .5)
#>           y      ymin      ymax .width .point .interval
#> 1 0.6508383 0.5685686 0.7597598    0.5   mode       hdi
#>           y      ymin      ymax .width .point .interval
#> 1 0.6508383 0.5685686 0.7597598    0.5   mode      hdci
#>           y      ymin      ymax .width .point .interval
#> 1 0.6508383 0.5475475 0.7427427    0.5   mode        qi
Code
## R code 3.13b(2b) HPDI 3 Water, 3 tosses #####################
# tidybayes::mode_hdi(samples_skewed_b$p_skewed_b, .width = .5) # generates error
tidybayes::mode_hdci(samples_skewed_b$p_skewed_b, .width = .5)
tidybayes::mode_qi(samples_skewed_b$p_skewed_b, .width = .5)
#>           y      ymin     ymax .width .point .interval
#> 1 0.9995616 0.8418418 0.998999    0.5   mode      hdci
#>           y      ymin      ymax .width .point .interval
#> 1 0.9995616 0.7087087 0.9349349    0.5   mode        qi

Because of the disadvantages of HPDI (more computationally intensive, greater simulation variance and harder to understand) Kurz will primarily stick to the PI-based intervals. And he will not use the 5.5% and 94.5% quantiles that are percentile intervals boundaries, corresponding to an 89% compatibility interval but stick to the 95% standard (frequentist) confidence interval.

3.2.3 Point Estimates

3.2.3.1 ORIGINAL

3.2.3.1.1 Measures of Central Tendency

“Given the entire posterior distribution, what value should you report? This seems like an innocent question, but it is difficult to answer. The Bayesian parameter estimate is precisely the entire posterior distribution, which is not a single number, but instead a function that maps each unique parameter value onto a plausibility value. So really the most important thing to note is that you don’t have to choose a point estimate. It’s hardly ever necessary and often harmful. It discards information.” (McElreath, 2020, p. 58) (pdf)

But whenever you have to do it, you must choose between mean, median and MAP (= the parameter value with highest posterior probability, a maximum a posteriori, essentially the “peak” or mode of the posterior distribution.

In the very skewed globe tossing example where we observed 3 waters out of 3 tosses they are all different.

R Code 3.42 a: Compute MAP, median and mean

Code
## R code 3.14a MAP (from grid) ###############
map_skewed_a_1 <- p_grid_skewed_a[which.max(posterior_skewed_a)]

## R code 3.15a MAP from posterior samples ################
map_skewed_a_2 <- rethinking::chainmode(samples_skewed_a, adj = 0.01)

## R code 3.16a mean and median #############
mean_skewed_a <- mean(samples_skewed_a)
median_skewed_a <- median(samples_skewed_a)

Results:

Difference between MAP and mode calculation?

I have observed small differences between the MAP calculation in R Code 3.42 and the mode calculation of other packages. I wonder why all these other methods give 0.95/0.96 whereas the MAP calculation results in 0.99/1.0.

One explanation I could think is that the mode is defined by the maximum frequency of observations, whereas the MAP is calculated from the maximum of the weighted probability frequency .

But I am not sure. One thing I can say with some certainty is that MAP seems a complicated theoretical construct and a difficult computational procedure.

R Code 3.43 a: Compare MAP and mode calculation of different packages

Code
m_skewed_1 <- modeest::mlv(samples_skewed_a, method = "mfv")
m_skewed_2 <- DescTools::Mode(samples_skewed_a)
m_skewed_3 <- bayestestR::map_estimate(samples_skewed_a)

Results of mode calculation with several packages:

The graphical representation as shown in Figure 3.4 will be calculated in the tidyverse version of this section. See: Graph 3.12 for the left panel and Graph 3.14 for the right panel of Figure 3.4.

3.2.3.1.2 Loss function to support particular decisions

“One principled way to go beyond using the entire posterior as the estimate is to choose a loss function.” (McElreath, 2020, p. 59) (pdf)

Different loss functions imply different point estimates. (McElreath, 2020, p. 59) (pdf)

“Here’s an example to help us work through the procedure. Suppose I offer you a bet. Tell me which value of p, the proportion of water on the Earth, you think is correct. I will pay you \(\$100\), if you get it exactly right. But I will subtract money from your gain, proportional to the distance of your decision from the correct value. Precisely, your loss is proportional to the absolute value of \(d − p\), where \(d\) is your decision and \(p\) is the correct answer. We could change the precise dollar values involved, without changing the important aspects of this // problem. What matters is that the loss is proportional to the distance of your decision from the true value. Now once you have the posterior distribution in hand, how should you use it to maximize your expected winnings? It turns out that the parameter value that maximizes expected winnings (minimizes expected loss) is the median of the posterior distribution.” (McElreath, 2020, p. 59/60) (pdf)

“Calculating expected loss for any given decision means using the posterior to average over our uncertainty in the true value. Of course we don’t know the true value, in most cases. But if we are going to use our model’s information about the parameter, that means using the entire posterior distribution. So suppose we decide \(p = 0.5\) will be our decision.” (McElreath, 2020, p. 60) (pdf).

R Code 3.44 a: Calculated expected loss for \(p = 0.5\)

Code
## R code 3.17a weighted average loss ##################
loss_avg_a <- sum(posterior_skewed_a * abs(0.5 - p_grid_skewed_a))

## R code 3.18a for every possible value #############################
loss_a <- sapply(p_grid_skewed_a, 
     function(d) sum(posterior_skewed_a * abs(d - p_grid_skewed_a)))

## R code 3.19a minimized loss value #############################
loss_min_a <- p_grid_skewed_a[which.min(loss_a)]

Results:

  • Weighted average loss value = 0.3128752.
  • Parameter value that minimizes the loss = 0.8408408. This is the posterior median that we already have calculated in R Code 3.42. Because of sampling variation it is not identical but pretty close (0.8428428 versus 0.8408408).

Learnings: Point estimates

“In order to decide upon a point estimate, a single-value summary of the posterior distribution, we need to pick a loss function. Different loss functions nominate different point estimates. The two most common examples are the absolute loss as above, which leads to the median as the point estimate, and the quadratic loss \((d − p)^{2}\), which leads to the posterior mean (mean(samples_a)) as the point estimate. When the posterior distribution is symmetrical and normal-looking, then the median and mean converge to the same point, which relaxes some anxiety we might have about choosing a loss function. For the original globe tossing data (6 waters in 9 tosses), for example, the mean and median are barely different.” (McElreath, 2020, p. 60) (pdf)

“Usually, research scientists don’t think about loss functions. And so any point estimate like the mean or MAP that they may report isn’t intended to support any particular decision, but rather to describe the shape of the posterior. You might argue that the decision to make is whether or not to accept an hypothesis. But the challenge then is to say what the relevant costs and benefits would be, in terms of the knowledge gained or lost. Usually it’s better to communicate as much as you can about the posterior distribution, as well as the data and the model itself, so that others can build upon your work. Premature decisions to accept or reject hypotheses can cost lives.” (McElreath, 2020, p. 61) (pdf)

3.2.3.2 TIDYVERSE

3.2.3.2.1 Measures of Central Tendency

If we sort the posterior values of d_skewed_b (three tosses with three W) from highest to lowest values then the first row (= the maximum value of posterior_skewed_b) will give us the MAP as the corresponding p_grid_b value. Additionally the {tidybayes} has other options to compute the MAP. To calculate mean and median we will use the Base R functions. In the following code chunk I have collected all these different calculations.

R Code 3.45 b: Compute MAP, median and mean

Code
## R code 3.14b MAP (from grid) ###############################
glue::glue('MAP computed with dplyr::arrange(dplyr::desc())\n')
map_skewed_b <- d_skewed_b |> 
  dplyr::arrange(dplyr::desc(posterior_skewed_b)) 

map_skewed_b[1, c(1,6)]
glue::glue('#####################################################\n\n')

## R code 3.15b MAP (from posterior sample) ###############################
glue::glue('MAP computed with tidybayes::mode_qi()\n')
samples_skewed_b |> tidybayes::mode_qi(p_skewed_b)
glue::glue('#####################################################\n\n')

glue::glue('MAP computed with tidybayes::mode_hdci()\n')
samples_skewed_b |> tidybayes::mode_hdci(p_skewed_b)
glue::glue('#####################################################\n\n')

glue::glue('MAP computed with tidybayes::Mode()\n')
tidybayes::Mode(samples_skewed_b$p_skewed_b)
glue::glue('#####################################################\n\n')

## R code 3.16b mean and median #############################
glue::glue('Mean & Median computed with mean() & median()\n')
samples_skewed_b |> 
  dplyr::summarize(mean   = mean(p_skewed_b),
            median = median(p_skewed_b))
#> MAP computed with dplyr::arrange(dplyr::desc())
#> # A tibble: 1 × 2
#>   p_grid_b posterior_skewed_b
#>      <dbl>              <dbl>
#> 1        1            0.00400
#> #####################################################
#> 
#> MAP computed with tidybayes::mode_qi()
#> # A tibble: 1 × 6
#>   p_skewed_b .lower .upper .width .point .interval
#>        <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
#> 1       1.00  0.399  0.994   0.95 mode   qi       
#> #####################################################
#> 
#> MAP computed with tidybayes::mode_hdci()
#> # A tibble: 1 × 6
#>   p_skewed_b .lower .upper .width .point .interval
#>        <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
#> 1       1.00  0.475      1   0.95 mode   hdci     
#> #####################################################
#> 
#> MAP computed with tidybayes::Mode()
#> [1] 0.9995616
#> #####################################################
#> 
#> Mean & Median computed with mean() & median()
#> # A tibble: 1 × 2
#>    mean median
#>   <dbl>  <dbl>
#> 1 0.803  0.843

We can now plot a graph to reproduce the left panel of the books Figure 3.4 where we will see the three different measures of central tendency. I will compare the skewed (three ‘W’, three tosses) with the somewhat normal version (six ‘W’, nine tosses).

R Code 3.46 : Posterior density for skewed and symmetric distribution

Code
## 1. bundle three types of estimates into a tibble. #######
point_estimates_b1 <-
  dplyr::bind_rows(samples_skewed_b |> tidybayes::mean_qi(p_skewed_b),
            samples_skewed_b |> tidybayes::median_qi(p_skewed_b),
            samples_skewed_b |> tidybayes::mode_qi(p_skewed_b)) |> 
  dplyr::select(p_skewed_b, .point) |> 

## 2. create two columns to annotate the plot #######
  dplyr::mutate(x = p_skewed_b + c(-.03, .03, -.03),
       y = c(.0005, .0012, .002))

## 3. plot #######
samples_skewed_b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_skewed_b)) +
  ggplot2::geom_area(ggplot2::aes(y = posterior_skewed_b),
            fill = "deepskyblue") +
  ggplot2::geom_vline(xintercept = point_estimates_b1$p_skewed_b) +
  ggplot2::geom_text(data = point_estimates_b1,
            ggplot2::aes(x = x, y = y, label = .point),
            angle = 90) +
  ggplot2::labs(x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw()

Graph 3.12: Posterior distribution after observing 3 ‘water’ in 3 tosses of the globe. Vertical lines show the locations of the mode, median, and mean. All three measures of central tendency differ because of the skewness of the distribution. Therefore each point implies a different loss function.
Code
## 0. compute mode with different method
mode2_b <- df_samples_b |>
  dplyr::arrange(dplyr::desc(posterior_samples_b)) |> 
  dplyr::slice(1) |> 
  dplyr::rename(.point = prior_samples_b) |> 
  dplyr::select(samples_b, .point) |> 
  dplyr::mutate(.point = "mode2")

## 1. bundle three types of estimates into a tibble ######
point_estimates_b2 <-
  dplyr::bind_rows(df_samples_b |> tidybayes::mean_qi(samples_b),
            df_samples_b |> tidybayes::median_qi(samples_b),
            df_samples_b |> tidybayes::mode_qi(samples_b)) |> 
  dplyr::select(samples_b, .point) |> 
  dplyr::bind_rows(mode2_b) |>
  
## 2. create two columns to annotate the plot #######
  dplyr::mutate(x = c(.55, .55, .75, .75),
         y = c(.0006, .0011, .0016, .0021))

## 3. plot ##########################################
df_samples_b |> 
  ggplot2::ggplot(ggplot2::aes(x = samples_b)) +
  ggplot2::geom_area(ggplot2::aes(y = posterior_samples_b),
           fill = "deepskyblue") +
  ggplot2::geom_vline(xintercept = point_estimates_b2$samples_b) +
  ggplot2::geom_text(data = point_estimates_b2,
            ggplot2::aes(x = x, y = y, label = .point)) +
  ggplot2::labs(x = "proportion of water (p)",
       y = "density") +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw() + 

## 4. annotation (arrows & text) ####################
  ggplot2::geom_segment(
    ggplot2::aes(x = 0.4, y = .0005, xend = point_estimates_b2$samples_b[[1]], yend = .0005),
    arrow = grid::arrow(length = grid::unit(0.5, "cm"))) +
  ggplot2::geom_segment(
    ggplot2::aes(x = 0.4, y = .001, xend = point_estimates_b2$samples_b[[2]], yend = .001),
    arrow = grid::arrow(length = grid::unit(0.5, "cm"))) +  
  ggplot2::geom_segment(
    ggplot2::aes(x = 0.85, y = .0015, xend = point_estimates_b2$samples_b[[3]], yend = .0015),
    arrow = grid::arrow(length = grid::unit(0.5, "cm"))) +
  ggplot2::geom_segment(
    ggplot2::aes(x = 0.85, y = .002, xend = point_estimates_b2$samples_b[[4]], yend = .002),
    arrow = grid::arrow(length = grid::unit(0.5, "cm"))
)

Graph 3.13: Point estimates in the almost symmetrical distribution of 6 ‘water’ in 9 tosses. Vertical lines show the locations of the mode, median, and mean. All three points are in a similar locatioon and have approximately the same loss function.

MAP (mode) is not the highest point in the more symmetric version (6 ‘W’, n=9)

I wounder if the calculation of the MAP of the somewhat symmetrical version is correct, because the MAP or mode is not the highest value in the distribution. In addition to calculate the mode with {tidybayes}, I have also used R code 3.14b from R Code 3.45 to compute MAP with dplyr::arrange(dplyr::desc()).

It turns out that there is difference: In contrast to {tidybayes} arranging the data frame results in a MAP value (mode2) which is indeed the highest point of the distribution. I don’t know how to interpret this disparity.

3.2.3.2.2 Loss function to support particular decisions

R Code 3.47 b: Calculated expected loss for \(p = 0.5\) with proportional and quadratic loss function

The absolute proportional loss \(d-p\) for the decision \(p = 0.5\) results into the median.

Code
## R code 3.17b weighted average loss ##################
loss_avg_b <- d_skewed_b |> 
    dplyr::summarise(`expected loss` = 
               base::sum(posterior_skewed_b * base::abs(0.5 - p_grid_b)))

## R code 3.18b for every possible value #############################

## write function
make_loss_b <- function(our_d) {
  d_skewed_b |> 
    dplyr::mutate(loss_b = posterior_skewed_b * base::abs(our_d - p_grid_b)) |> 
    dplyr::summarise(weighted_average_loss_b = base::sum(loss_b))
}

## calculate loss for all possible values 
glue::glue("Every possible loss values for decision 0.5 with proportional loss function\n")

l_b <-
  d_skewed_b |> 
  dplyr::select(p_grid_b) |> 
  dplyr::rename(decision_b = p_grid_b) |> 
  dplyr::mutate(weighted_average_loss_b = purrr::map(decision_b, make_loss_b)) |> 
  tidyr::unnest(weighted_average_loss_b) 
head(l_b)

## R code 3.19b minimized loss value #############################
# this will help us find the x and y coordinates for the minimum value
loss_min_b <-
    l_b |> 
    dplyr::filter(weighted_average_loss_b == base::min(weighted_average_loss_b)) |> 
    base::as.numeric()
#> Every possible loss values for decision 0.5 with proportional loss function
#> # A tibble: 6 × 2
#>   decision_b weighted_average_loss_b
#>        <dbl>                   <dbl>
#> 1    0                         0.800
#> 2    0.00100                   0.799
#> 3    0.00200                   0.798
#> 4    0.00300                   0.797
#> 5    0.00400                   0.796
#> 6    0.00501                   0.795

Results:

  • Weighted average loss value = 0.3128752.
  • Parameter value that minimizes the loss = 0.8408408. This is the posterior median that we already have calculated in R Code 3.45. Because of sampling variation it is not identical but pretty close (0.8428428 versus 0.8408408).

The quadratic loss \((d−p)^{2}\) for the decision \(p = 0.5\) suggests we should use the mean.

Code
## R code 3.17b weighted average loss ##################
loss_avg_b2 <- d_skewed_b |> 
    dplyr::summarise(`expected loss` = 
               base::sum(posterior_skewed_b * base::sqrt(abs(0.5 - p_grid_b))))

## R code 3.18b for every possible value #############################
# amend our loss function

make_loss2_b <- function(our_d2) {
  d_skewed_b |> 
    dplyr::mutate(loss2_b = posterior_skewed_b * (our_d2 - p_grid_b)^2) |> 
    dplyr::summarise(weighted_average_loss2_b = base::sum(loss2_b))
}


# remake our `l` data
glue::glue("Every possible loss values for decision 0.5 with quadratic loss function\n")

l2_b <-
    d_skewed_b |> 
    dplyr::select(p_grid_b) |> 
    dplyr::rename(decision2_b = p_grid_b) |> 
    dplyr::mutate(weighted_average_loss2_b = purrr::map(decision2_b, make_loss2_b)) |> 
    tidyr::unnest(weighted_average_loss2_b)
head(l2_b)

## R code 3.19b minimized loss value #############################
# update to the new minimum loss coordinates


loss_min_b2 <-
    l2_b |> 
    dplyr::filter(weighted_average_loss2_b == base::min(weighted_average_loss2_b)) |> 
    base::as.numeric()
#> Every possible loss values for decision 0.5 with quadratic loss function
#> # A tibble: 6 × 2
#>   decision2_b weighted_average_loss2_b
#>         <dbl>                    <dbl>
#> 1     0                          0.667
#> 2     0.00100                    0.666
#> 3     0.00200                    0.664
#> 4     0.00300                    0.663
#> 5     0.00400                    0.661
#> 6     0.00501                    0.659

Results:

  • Weighted average loss value = 0.5390867.
  • Parameter value that minimizes the loss = 0.8008008. This is the posterior mean that we already have calculated in R Code 3.45. Because of sampling variation it is not identical but pretty close (0.8027632 versus 0.8008008).

Now we’re ready to reproduce the right panel of Figure 3.4., e.g., displaying the the loss function and computing the minimum loss value. Remember: Different loss functions imply different point estimates.

R Code 3.48 b: Plot expected loss for \(p = 0.5\) with proportional and quadratic loss function

The absolute proportional loss \(d-p\) for the decision \(p = 0.5\) results into the median.

Code
## right panel figure 3.4 ########################

l_b |>   
  ggplot2::ggplot(ggplot2::aes(x = decision_b, y = weighted_average_loss_b)) +
  ggplot2::geom_area(fill = "deepskyblue") +
  ggplot2::geom_vline(xintercept = loss_min_b[1], color = "black", linetype = 3) +
  ggplot2::geom_hline(yintercept = loss_min_b[2], color = "black", linetype = 3) +
  ggplot2::ylab("expected proportional loss") +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw()

Graph 3.14: Expected loss under the rule that loss is proportional to absolute distance of decision (horizontal axis) from the true value. The point marks the value of p that minimizes the expected loss, the posterior median.

We saved the exact minimum value as loss_min_b[1], which is 0.8408408. Within sampling error, this is the posterior median as depicted by our samples (0.8428428 versus 0.8408408).

The quadratic loss \((d−p)^{2}\) for the decision \(p = 0.5\) suggests we should use the mean.

Code
# update the plot
l2_b |>   
  ggplot2::ggplot(ggplot2::aes(x = decision2_b, y = weighted_average_loss2_b)) +
  ggplot2::geom_area(fill = "deepskyblue") +
  ggplot2::geom_vline(xintercept = loss_min_b2[1], color = "black", linetype = 3) +
  ggplot2::geom_hline(yintercept = loss_min_b2[2], color = "black", linetype = 3) +
  ggplot2::ylab("expected proportional loss") +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw()

Graph 3.15: Expected loss under the rule that loss is quadratic to the distance of decision (horizontal axis) from the true value. The point marks the value of p that minimizes the expected loss, the posterior mean

Based on quadratic loss \((d−p)^{2}\), the exact minimum value is 0.8008008. Within sampling error, this is the posterior mean of our samples (0.8027632 versus 0.8008008).

3.3 Sampling to simulate prediction

3.3.1 ORIGINAL

Procedure 3.2 : 5 Reasons to simulate implied observations

“Another common job for samples is to ease simulation of the model’s implied observations. Generating implied observations from a model is useful for at least four five reasons.” (McElreath, 2020, p. 61) (pdf)

  1. Model design
  2. Model checking
  3. Software validation
  4. Research design
  5. Forecasting

“We will call such simulated data dummy data, to indicate that it is a stand-in for actual data.” (McElreath, 2020, p. 62) (pdf)

3.3.1.1 Dummy data

Likelihood functions work in both directions:

  • They help to infer the plausibility of each possible value of p. “With the globe tossing model, the dummy data arises from a binomial likelihood:” (McElreath, 2020, p. 62) (pdf)
  • They can be used to simulate the observations that the model implies. “You could use base::sample() to do this, but R provides convenient sampling functions for all the ordinary probability distributions, like the binomial.” (McElreath, 2020, p. 62) (pdf)

“Given a realized observation, the likelihood function says how plausible the observation is. And given only the parameters, the likelihood defines a distribution of possible observations that we can sample from, to simulate observation.” (McElreath, 2020, p. 62) (pdf)

R Code 3.49 a: Suppose \(N = 2\), two tosses of the globe with \(p = 0.7\)

Code
## R code 3.20a #############################
(
  two_tosses_prob_a <- dbinom(0:2, size = 2, prob = 0.7)
)
#> [1] 0.09 0.42 0.49

Interpretation of the observed result of two tosses with a probability of \(p = 0.7\)

With \(N = 2\) there are only three possibilities: 0, 1 or 2 water.

  • 9% chance of observing 0 water.
  • 42% chance of observing 1 water.
  • 49% chance of observing 2 water.

With different \(p\) you will get a different result.

Code
base::set.seed(3) # for reproducibility

## R code 3.21a #############################
(
  two_tosses_sim_a <- rbinom(1, size = 2, prob = 0.7)
)
#> [1] 2

Interpretation of the simulated result of two tosses with a probability of \(p = 0.7\)

One simulation (trial) consists with \(N = 2\) of two simulated tosses:

  • In our simulation (trial) of two simulated tosses \(N = 2\) we got 2 water.

Without the base::set.seed() function the result will change. Try it.

With rbinom() you can also generate more than one simulation (trial) at a time.

R Code 3.50 a: Using rbinom() to simulate two tosses to generate 10 and 100,000 dummy observations

Code
base::set.seed(3)
## R code 3.22a #############################
rbinom(10, size = 2, prob = 0.7)
#>  [1] 2 1 2 2 1 1 2 2 1 1
Code
base::set.seed(3)
## R code 3.23a #############################
dummy_w_a <- rbinom(1e5, size = 2, prob = 0.7)
table(dummy_w_a) / 1e5
#> dummy_w_a
#>       0       1       2 
#> 0.09000 0.42051 0.48949

Now let’s simulate the same amount of tosses (sample size = 9) as we have used previously. I will provide two kinds of R graphs: One for rethinking::simplehist() as in the book and one using base R with graphics::hist().

R Code 3.51 a: Distribution of simulated sample observations from 9 tosses of the globe

Code
base::set.seed(3)
## R code 3.24a with hist() ##############
dummy_w_a <- rbinom(1e5, size = 9, prob = 0.7)
hist(dummy_w_a, xlab = "dummy water count")

Graph 3.16: Distribution of simulated sample observations from 9 tosses of the globe. These samples assume the proportion of water is 0.7. The plot uses the base R hist() function
Code
base::set.seed(3)
## R code 3.24a with simplehist() #############################
dummy_w_a <- stats::rbinom(1e5, size = 9, prob = 0.7)
rethinking::simplehist(dummy_w_a, xlab = "dummy water count")

Graph 3.17: Distribution of simulated sample observations from 9 tosses of the globe. These samples assume the proportion of water is 0.7. the plot uses the `rethinking::simplehist()``` function

Note that in our example using base::set.seed(3) the simulation does not generate the water in its true proportion of \(0.7\).

“That’s the nature of observation: There is a one-to-many relationship between data and data-generating processes. You should [delete the base::set.seed() line and] experiment with sample size, the size input in the code above, as well as the prob, to see how the distribution of simulated samples changes shape and location.” (McElreath, 2020, p. 63) (pdf)

Sampling distributions versus Samples drawn from the posterior

“Many readers will already have seen simulated observations. Sampling distributions are the foundation of common non-Bayesian statistical traditions. In those approaches, inference about parameters is made through the sampling distribution. In this book, inference about parameters is never done directly through a sampling distribution. The posterior distribution is not sampled, but deduced logically. Then samples can be drawn from the posterior, as earlier in this chapter, to aid in inference. In neither case is ‘sampling’ a physical act. In both cases, it’s just a mathematical device and produces only small world (Chapter 2) numbers.” (McElreath, 2020, p. 63) (pdf)

3.3.1.2 Model checking

Two main purposes:

  1. Checking if software is working correctly
  2. Evaluation the adequacy of the model

“All that is required is averaging over the posterior density for \(p\), while computing the predictions. For each possible value of the parameter \(p\), there is an implied distribution of outcomes. So if you were to compute the sampling distribution of outcomes at each value of \(p\), then you could average all of these prediction distributions together, using the posterior probabilities of each value of \(p\), to get a posterior predictive distribution.” (McElreath, 2020, p. 65) (pdf)

Procedure 3.3 a: How to generate the posterior predictive distribution

  1. Use rbinom() to generate random binomial samples.
  2. Use as prob value not a single value but samples from the posterior

I will not reproduce Figure 3.6 in the upcoming section. The complex procedure to plot the graph does not add much understanding in Bayesian statistics. But the content of the picture itself is very important to understand how the posterior predictive distribution is generated.


Top panel shows the posterior probability, middle panel the sampling distribution and the bottom panel the posterior predictive distribution

Graph 3.18: “Simulating predictions from the total posterior. Top: The familiar posterior distribution for the globe tossing data. Ten example parameter values are marked by the vertical lines. Values with greater posterior probability indicated by thicker lines. Middle row: Each of the ten parameter values implies a unique sampling distribution of predictions. Bottom: Combining simulated observation distributions for all parameter values (not just the ten shown), each weighted by its posterior probability, produces the posterior predictive distribution. This distribution propagates uncertainty about parameter to uncertainty about prediction.” (McElreath, 2020, p. 65) (pdf)

We will generates 10,000 (1e4) simulated predictions of 9 globe tosses (size = 9)

R Code 3.52 a: Simulate predicted observations for a single value (\(p = 0.6\)) and with samples of the posterior

Generating 10,000 (1e4) simulated predictions of 9 globe tosses (size = 9).

Code
base::set.seed(3)
## R code 3.25a #############################
w_a1 <- rbinom(1e4, size = 9, prob = 0.6)
rethinking::simplehist(w_a1)

Graph 3.19: Simulate predicted observations for a single value p value of 0.6

The predictions are stored as counts of water, so the theoretical minimum is zero and the theoretical maximum is nine. Although our fix prob value = 0.6 the samples show 5 as the mode of the randomly generated distribution.

Code
base::set.seed(3)
## R code 3.26a #############################
w_a2 <- rbinom(1e4, size = 9, prob = samples_a)
rethinking::simplehist(w_a2)

Graph 3.20: Simulate predicted observations with samples of the posterior

The code propagate parameter uncertainty into the predictions by replacing a fixed prob value by the vector samples_a. The symbol samples_a is the same list of random samples from the posterior distribution that we have calculated in ?cnj-sample-globe-tossing and used in previous sections.

“For each sampled value, a random binomial observation is generated. Since the sampled values appear in proportion to their posterior probabilities, the resulting simulated observations are averaged over the posterior. You can manipulate these simulated observations just like you manipulate samples from the posterior—you can compute intervals and point statistics using the same procedures.” (McElreath, 2020, p. 66) (pdf)

But Graph 3.20 just predicts the posterior distribution as our model sees the data. To evaluate the adequacy of the model for certain purposes it is important to investigate it under different point of views. McElreath proposes two look at the data in additional two ways:

  1. Plot the distribution of the longest run of either water or land.
  2. Plot the distribution of the number of switches from water to land and reverse.

The calculation and reproduction of Figure 3.7 is demonstrated in the next tidyverse section.

3.3.2 TIDYVERSE

3.3.2.1 Dummy data

R Code 3.53 b: Suppose \(N = 2\), two tosses of the globe with \(p = 0.7\)

Code
## R code 3.20b ######################
tibble::tibble(n      = 2,
       `p(w)` = .7,
       w      = 0:n) |> 
  dplyr::mutate(density = stats::dbinom(w, size = n, prob = `p(w)`))
#> # A tibble: 3 × 4
#>       n `p(w)`     w density
#>   <dbl>  <dbl> <int>   <dbl>
#> 1     2    0.7     0    0.09
#> 2     2    0.7     1    0.42
#> 3     2    0.7     2    0.49
Code
## R code 3.21b = 3.21a ######################
base::set.seed(3)
stats::rbinom(1, size = 2, prob = .7)
#> [1] 2

Compare the results with the calculation in R Code 3.49.

b: Using stats::rbinom() to simulate two tosses to generate 10 and 100,000 dummy observations

Code
base::set.seed(3)
## R code 3.22b = 3.22a #############################
stats::rbinom(10, size = 2, prob = 0.7)
#>  [1] 2 1 2 2 1 1 2 2 1 1
Code
## R code 3.23b ########################
n_draws_b <- 1e5
base::set.seed(3)
dummy_w_b <- tibble::tibble(draws = stats::rbinom(n_draws_b, size = 2, prob = .7)) 

dummy_w_b |> 
    dplyr::count(draws) |> 
    dplyr::mutate(proportion = n / base::nrow(dummy_w_b))
#> # A tibble: 3 × 3
#>   draws     n proportion
#>   <int> <int>      <dbl>
#> 1     0  9000      0.09 
#> 2     1 42051      0.421
#> 3     2 48949      0.489

The simulation updated to \(n=9\) and plotting the tidyverse version of Figure 3.5.

R Code 3.54 b: Distribution of simulated sample observations from 9 tosses of the globe

Code
## R code 3.24b with ggplot2 #########################
n_draws_b <- 1e5

base::set.seed(3)
dummy_w2_b <- tibble::tibble(draws = stats::rbinom(n_draws_b, size = 9, prob = .7))

p1 <- dummy_w2_b |> 
    ggplot2::ggplot(ggplot2::aes(x = draws)) + 
    ggplot2::geom_histogram(binwidth = 1, center = 0,
                 fill = "deepskyblue", color = "black", 
                 linewidth = 1/10) +
    # breaks = 0:10 * 2 = equivalent in Kurz's versions:  breaks = 0:4 * 2
    ggplot2::scale_x_continuous("dummy water count", breaks = 0:10 * 2) +
    ggplot2::ylab("frequency") +
    ggplot2::coord_cartesian(xlim = c(0, 9)) +
    ggplot2::theme(panel.grid = ggplot2::element_blank()) + 
    ggplot2::theme_bw()

p2 <- dummy_w2_b |> 
    ggplot2::ggplot(ggplot2::aes(x = draws)) + 
    ggplot2::geom_histogram(binwidth = 1, center = 0,
                 fill = "deepskyblue", color = "black", 
                 linewidth = 1/10) +
    ## breaks = 0:10 * 2 = equivalent in Kurz's versions:  breaks = 0:4 * 2
    ## I decided to set a break at each of the draws: breaks = 0:9 * 1
    ggplot2::scale_x_continuous("dummy water count", breaks = 0:9 * 1) +
    ggplot2::ylab("frequency") +
    ## I did not zoom into the graph because doesn't look so nice
    ## for instance the last line in Kurz’ version is not visible
    # coord_cartesian(xlim = c(0, 9)) +
    ggplot2::theme(panel.grid = ggplot2::element_blank()) +
    ggplot2::theme_bw()

library(patchwork)
p1 + p2

Graph 3.21: Distribution of simulated sample observations from 9 tosses of the globe. These samples assume the proportion of water is 0.7. The plot uses the {ggplot2} functions. The left panel is Kurz’s original, the right one is my version slightly changed.

McElreath suggested to play around with different values of size and prob. But instead of reproducing all codes block introduced by Kurz, I will just replicate the first chunk, because it has some interesting and (for me) unfamiliar functions.

R Code 3.55 b: Simulated sample observations, using different sizes and probabilities

Code
n_draws <- 1e5
simulate_binom <- function(n, probability) {
  base::set.seed(3)
  stats::rbinom(n_draws, size = n, prob = probability) 
}

d9_b <-
  tidyr::crossing(n9_b           = c(3, 6, 9),
           probability9_b = c(.3, .6, .9)) |> 
  dplyr::mutate(draws9_b = purrr::map2(n9_b, probability9_b, simulate_binom)) |> 
  dplyr::ungroup() |> 
  dplyr::mutate(n           = stringr::str_c("n = ", n9_b),
         probability = stringr::str_c("p = ", probability9_b)) |> 
  tidyr::unnest(draws9_b)

d9_b |> 
    dplyr::slice_sample(n = 10) |> 
    dplyr::arrange(n9_b)
#> # A tibble: 10 × 5
#>     n9_b probability9_b draws9_b n     probability
#>    <dbl>          <dbl>    <int> <chr> <chr>      
#>  1     3            0.3        2 n = 3 p = 0.3    
#>  2     3            0.9        3 n = 3 p = 0.9    
#>  3     3            0.9        3 n = 3 p = 0.9    
#>  4     3            0.9        3 n = 3 p = 0.9    
#>  5     3            0.6        3 n = 3 p = 0.6    
#>  6     6            0.9        5 n = 6 p = 0.9    
#>  7     6            0.9        5 n = 6 p = 0.9    
#>  8     6            0.6        3 n = 6 p = 0.6    
#>  9     9            0.6        7 n = 9 p = 0.6    
#> 10     9            0.3        3 n = 9 p = 0.3

I am still not very experienced with tidyr::crossing() and tidyr::unnest():

Instead of utils::head() I used dplyr::slice_sample() and ordered the result by the first column. I think this will get a better glimpse on the data as just the first 6 rows.

Code
d9_b |> 
  ggplot2::ggplot(ggplot2::aes(x = draws9_b)) +
  ggplot2::geom_histogram(binwidth = 1, center = 0,
                 fill = "deepskyblue", color = "white", linewidth = 1/10) +
  ggplot2::scale_x_continuous("dummy water count", breaks = 0:4 * 2) +
  ggplot2::ylab("frequency") +
  ggplot2::coord_cartesian(xlim = c(0, 9)) +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw() +
  ggplot2::facet_grid(n9_b ~ probability9_b)

Graph 3.22: ?(caption)

3.3.2.2 Model checking

Checking if the software works correctly

On software checking Kurz refers to some material, that is too special at the moment for me. I will it include here, even if I had not read it. Maybe I will come later here again and pick up these resources when I have more experiences with Bayesian statistics and the necessary tools.

There are three computing steps to reproduce the three levels of Figure 3.6 copied here as Graph 3.18.

  1. Posterior probability (top level)
  2. Sampling distributions for nine values (0.1-0.9) (middle level)
  3. Posterior predictive distribution (bottom level)

R Code 3.56 b: Posterior probability

Posterior probability

Code
## data wrangling ####################
n2_b <- 1001L
n_success <- 6L
n_trials  <- 9L

d2_b <-
  tibble::tibble(p_grid2_b = base::seq(from = 0, to = 1, length.out = n2_b),
         # note we're still using a flat uniform prior
         prior2_b  = 1) |> 
  dplyr::mutate(likelihood2_b = 
                stats::dbinom(n_success, size = n_trials, prob = p_grid2_b)) |> 
  dplyr::mutate(posterior2_b = 
                (likelihood2_b * prior2_b) / sum(likelihood2_b * prior2_b))
head(d2_b)


## plot ##############################
d2_b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_grid2_b, y = posterior2_b)) +
  ggplot2::geom_area(color = "deepskyblue", fill = "deepskyblue") +
  ggplot2::geom_segment(data = d2_b |>   
     dplyr::filter(p_grid2_b %in% 
         c(base::seq(from = .1, to = .9, by = .1), 3 / 10)),
     ## Note how we Wweight the widths of the vertical lines 
     ## by the posterior density `posterior2_b`
         ggplot2::aes(xend = p_grid2_b, yend = 0, linewidth = posterior2_b),
         color = "black", show.legend = F) +
  ggplot2::geom_point(data = d2_b |>  
     dplyr::filter(p_grid2_b %in% 
         c(base::seq(from = .1, to = .9, by = .1), 3 / 10))) +
  ggplot2::annotate(geom = "text", 
         x = .08, y = .0025,
         label = "Posterior probability") +
  ggplot2::scale_linewidth_continuous(range = c(0, 1)) +
  ggplot2::scale_x_continuous("probability of water", breaks = 0:10 / 10) +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme()
#> # A tibble: 6 × 4
#>   p_grid2_b prior2_b likelihood2_b posterior2_b
#>       <dbl>    <dbl>         <dbl>        <dbl>
#> 1     0            1      0            0       
#> 2     0.001        1      8.37e-17     8.37e-19
#> 3     0.002        1      5.34e-15     5.34e-17
#> 4     0.003        1      6.07e-14     6.07e-16
#> 5     0.004        1      3.40e-13     3.40e-15
#> 6     0.005        1      1.29e-12     1.29e-14

Graph 3.23: Reproduction of the top part of figure 3.6

At first I thought I do not need to refresh the original grid approximation from R Code 3.10 as I have it stored it with the unique name d_b. But it turned out that the above code with n_grid_b = 1000L does not work, because it draws no vertical lines by the posterior density. Instead one has to sample 1001 times.

I noticed that with most sample numbers the plot does not work correctly. It worked with 1071. The sequence 1011, 1021, 1031, 1041, 1051, 1061 misses just one vertical line at \(p = 0.7\) An exception is 1041, which misses \(p = 0.6\). I do not know why this happens.

Code
## data wrangling #####################
n_draws_b <- 1e5

simulate_binom <- function(probability) {
  base::set.seed(3)
  stats::rbinom(n_draws_b, size = 9, prob = probability) 
}

d_small_b <-
  tibble::tibble(probability = base::seq(from = .1, to = .9, by = .1)) |> 
  dplyr::mutate(draws = purrr::map(probability, simulate_binom)) |>  
  tidyr::unnest(draws) |> 
  dplyr::mutate(label = stringr::str_c("p = ", probability))

utils::head(d_small_b)

## plot ###############################
d_small_b |>
  ggplot2::ggplot(ggplot2::aes(x = draws)) +
  ggplot2::geom_histogram(binwidth = 1, center = 0, color = "black",
                 fill = "deepskyblue", linewidth = 1/10) +
  ggplot2::scale_x_continuous(NULL, breaks = 0:3 * 3) +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::labs(subtitle = "Sampling distributions") +
  ggplot2::coord_cartesian(xlim = c(0, 9)) +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw() +
  ggplot2::facet_wrap(~ label, ncol = 9)
#> # A tibble: 6 × 3
#>   probability draws label  
#>         <dbl> <int> <chr>  
#> 1         0.1     0 p = 0.1
#> 2         0.1     2 p = 0.1
#> 3         0.1     0 p = 0.1
#> 4         0.1     0 p = 0.1
#> 5         0.1     1 p = 0.1
#> 6         0.1     1 p = 0.1

Graph 3.24: Reproduction of the middle part of figure 3.6
Code
## data wrangling ########################################
# how many samples would you like?
n_samples_b <- 1e4

# make it reproducible
base::set.seed(3)

d2_samples_b <-
  d2_b |> 
  dplyr::slice_sample(n = n_samples_b, weight_by = posterior2_b, replace = T)  |>  
  dplyr::mutate(w = purrr::map_dbl(p_grid2_b, stats::rbinom, n = 1, size = 9))

dplyr::glimpse(d2_samples_b)

## plot ###################################
d2_samples_b |> 
  ggplot2::ggplot(ggplot2::aes(x = w)) +
  ggplot2::geom_histogram(binwidth = 1, center = 0, color = "black",
                 fill = "deepskyblue", linewidth = 1/10) +
  ggplot2::scale_x_continuous("number of water samples",
                     breaks = 0:3 * 3) +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::ggtitle("Posterior predictive distribution") +
  ggplot2::coord_cartesian(xlim = c(0, 9),
                  ylim = c(0, 3000)) +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw()
#> Rows: 10,000
#> Columns: 5
#> $ p_grid2_b     <dbl> 0.564, 0.651, 0.487, 0.592, 0.596, 0.787, 0.727, 0.490, …
#> $ prior2_b      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ likelihood2_b <dbl> 0.22408531, 0.27179502, 0.15128823, 0.24557832, 0.248256…
#> $ posterior2_b  <dbl> 0.0022408531, 0.0027179502, 0.0015128823, 0.0024557832, …
#> $ w             <dbl> 4, 7, 3, 3, 7, 6, 8, 2, 6, 4, 5, 5, 8, 6, 4, 6, 8, 2, 6,…

Graph 3.25: Reproduction of the bottom part of figure 3.6

In Figure 3.7, McElreath checked the adequacy of the model with

  1. the longest sequence of the sample values and
  2. number of switches between water and land.

But he didn’t show who to calculate and produce his Figure 3.7. To compute these alternative views and replicate the graph Kurz used the (for me until now unknown) base::rle() function in a brilliant manner. Before we replicate his computation let’s look more into the details of base::rle()

R Code 3.57 b: How the base::rle() function works for checking our model

Code
tosses <- c("w", "l", "w", "w", "w", "l", "w", "l", "w")
base::rle(tosses)

## In the left panel of Figure 3.7 we are interested in maximal length
rle(tosses)$lengths  |>  max()

## In the right panel of Figure 3.7 we are interested in maximal switches
rle(tosses)$lengths |>  length() - 1
#> Run Length Encoding
#>   lengths: int [1:7] 1 1 3 1 1 1 1
#>   values : chr [1:7] "w" "l" "w" "l" "w" "l" "w"
#> [1] 3
#> [1] 6

The observed events in our globe tossing example are: w,l,w,w,w,l,w,l,w. The maximal length of consecutive water events is: rle(tosses)$lengths |> max() = 3.

Because rle() aggregated the consecutive length of water and land we can use this result as the maximum length of switches between water and land: rle(tosses)$lengths |> length() - 1 = 6

Different observations versus switches

I think there is a small miscalculation in Kurz’ text: The number of switches is the length of different observation minus one. There are 7 different observation but only 6 switches. This also conforms to the book.

Now we will follow Kurz’ procedure of calculating and plotting Figure 3.7.

Procedure 3.4 : How the base::rle() function works

He reasoned:

  1. We’ve been using stats::rbinom() with the size parameter set to 9 for our simulations for a certain number of observation, e.g., 10 experiments á la 9 tosses each. This gives us the aggregated results of the number of water events: rbinom(10, size = 9, prob = .6) = 7, 5, 6, 8, 7, 5, 6, 3, 3, 4
  2. But what we need is to simulate nine individual trials many times over: rbinom(9, size = 1, prob = .6) = 0, 1, 1, 1, 0, 0, 0, 0, 0
  3. We simulate the result of individual trials many times and add their outcomes into a tibble d2_samples_b.
Code
set.seed(3)

d2_samples_b <-
  d2_samples_b |> 
  dplyr::mutate(iter  = 1:dplyr::n(),
         draws = purrr::map(p_grid2_b, rbinom, n = 9, size = 1)) |> 
  tidyr::unnest(draws)

dplyr::glimpse(d2_samples_b)
#> Rows: 90,000
#> Columns: 7
#> $ p_grid2_b     <dbl> 0.564, 0.564, 0.564, 0.564, 0.564, 0.564, 0.564, 0.564, …
#> $ prior2_b      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ likelihood2_b <dbl> 0.2240853, 0.2240853, 0.2240853, 0.2240853, 0.2240853, 0…
#> $ posterior2_b  <dbl> 0.002240853, 0.002240853, 0.002240853, 0.002240853, 0.00…
#> $ w             <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 7, 3,…
#> $ iter          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,…
#> $ draws         <int> 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1,…
  1. Now we can use the base:rle()function to compute consecutive length and number of switches:

R Code 3.58 b: Replication of left panel of Figure 3.7

Code
d2_samples_b |>
  dplyr::group_by(iter) |> 
  dplyr::summarize(longest_run_length = base::rle(draws)$lengths |> 
              base::max()) |>
  
  ggplot2::ggplot(ggplot2::aes(x = longest_run_length)) +
  ggplot2::geom_histogram(ggplot2::aes(fill = longest_run_length == 3),
                 binwidth = 0.2, center = 0,
                 color = "white", linewidth = 1/10) +
  ggplot2::scale_fill_viridis_d(option = "C", end = .9) +
  ggplot2::scale_x_continuous("longest run length", breaks = 0:3 * 3) +
  ggplot2::ylab("frequency") +
  ggplot2::coord_cartesian(xlim = c(0, 9)) +
  ggplot2::theme_bw() +
  ggplot2::theme(legend.position = "none",
        panel.grid = ggplot2::element_blank()) 

Graph 3.26: The length of the maximum run of water or land

b: Replication of right panel of Figure 3.7

Code
d2_samples_b |>
  dplyr::group_by(iter) |>
  dplyr::summarize(number_of_switches = base::rle(draws)$lengths |>
                     base::length()) |>
  
  ggplot2::ggplot(ggplot2::aes(x = number_of_switches)) +
  ggplot2::geom_histogram(ggplot2::aes(fill = number_of_switches == 6),
                 binwidth = 0.2, center = 0,
                 color = "white", linewidth = 1/10) +
  ggplot2::scale_fill_viridis_d(option = "C", end = .9) +
  ggplot2::scale_x_continuous("number of switches", breaks = 0:3 * 3) +
  ggplot2::ylab("frequency") +
  ggplot2::coord_cartesian(xlim = c(0, 9)) +
  ggplot2::theme_bw() +
  ggplot2::theme(legend.position = "none",
        panel.grid = ggplot2::element_blank())

Graph 3.27: The number of switches between water and land samples

Check model adequacy with alternative views of the same posterior predictive distribution

For me there are two conclusion to draw from the section on model checking:

  1. It is possible and also important to look at the data generated by the model from different angels.

“Instead of considering the data as the model saw it, as a sum of water samples, now we view the data as both the length of the maximum run of water or land (left) and the number of switches between water and land samples (right).” (McElreath, 2020, p. 67) (pdf)

  1. To generate alternative views at the requires some ingenuity and an evaluation if the model is adequate. For instance: 6 switches in 9 tosses in not consistent with the distribution of the longest runs. There should be fewer switches.

3.3.2.3 Let’s practice with {brms}

The content of this section is more or less copied from Kurz without much knowledge of {brms}. The following code chunks are therefore my first acquaintance with the {brms} package and my reassurance that the package installation did work and to test if I get the same results as Kurz.

How to use brms

I still have to read How to use {brms} and the many packages vignettes in the help file or at the website.

With {brms}, we’ll fit the primary model of \(w=6\) and \(n=9\) much like we did at the end of Chapter 2.

R Code 3.59 b: Fit model and compute posterior summary for b_Intercept, the probability of a “w”

Code
b3.1 <-
  brms::brm(data = list(w = 6), 
      family = binomial(link = "identity"),
      w | trials(9) ~ 0 + Intercept,
      # this is a flat prior
      brms::prior(beta(1, 1), class = b, lb = 0, ub = 1),
      iter = 5000, warmup = 1000,
      seed = 3,
      file = "fits/b03.01")

brms::posterior_summary(b3.1)["b_Intercept", ] |> 
  round(digits = 2)
#>  Estimate Est.Error      Q2.5     Q97.5 
#>      0.64      0.14      0.34      0.88

Explication of the result

  • Estimate is the posterior mean,
  • the two Q columns are the quantile-based 95% intervals,
  • the Est.Error is the posterior standard deviation.
Note

I have already some experience with the beta distribution from my study of Bayesian Statistics the Fun Way: Understanding Statistics and Probability With Star Wars, LEGO, and Rubber Ducks by Will Kurt (Kurt 2019). See also my Quarto book notes. There is more about the beta distribution in ?sec-chap12.

But I will still have to learn about the syntax of the first three lines. The rest of the model description is more or less self-explanatory.

Much like the way we used the dplyr::slice_sample() function to simulate probability values, above, we can do so with the brms::fitted() function. But we will have to specify scale = "linear" in order to return results in the probability metric. By default, brms::fitted() will return summary information. Since we want actual simulation draws, we’ll specify summary = F.

R Code 3.60 b: Simulation of probability values

Code
f <-
  brms:::fitted.brmsfit(b3.1, 
         summary = F,
         scale = "linear") |> 
  data.frame() |> 
  rlang::set_names("p")

dplyr::glimpse(f)
#> Rows: 16,000
#> Columns: 1
#> $ p <dbl> 0.6318994, 0.8105015, 0.7677781, 0.7250286, 0.7265799, 0.7376768, 0.…

How to use S3 methods without attached package?

I had problems with the following code chunk. See my postings at Kurz’s repo.

Now we can display the density and the posterior predictive distribution

R Code 3.61 b: Model b3.1: Density and posterior predictive distribution

Code
f |> 
  ggplot2::ggplot(ggplot2::aes(x = p)) +
  ggplot2::geom_density(fill = "deepskyblue", color = "deepskyblue") +
  ggplot2::annotate(geom = "text", x = .08, y = 2.5,
           label = "Posterior probability") +
  ggplot2::scale_x_continuous("probability of water",
                     breaks = c(0, .5, 1),
                     limits = 0:1) +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw()

Graph 3.28: Density plot of the fittet {brms} model

The graphic should look like the top part of Graph 3.18.

R Code 3.62 b: Posterior predictive distribution for model b3.1

Code
base::set.seed(3)
f <-
  f |> 
  dplyr::mutate(w2 = stats::rbinom(dplyr::n(), size = 9,  prob = p))

# the plot
f |> 
  ggplot2::ggplot(ggplot2::aes(x = w2)) +
  ggplot2::geom_histogram(binwidth = 1, center = 0, color = "black",
                 fill = "deepskyblue", linewidth = 1/10) +
  ggplot2::scale_x_continuous("number of water samples", breaks = 0:3 * 3) +
  ggplot2::scale_y_continuous(NULL, breaks = NULL, limits = c(0, 5000)) +
  ggplot2::ggtitle("Posterior predictive distribution") +
#  ggplot2::coord_cartesian(xlim = c(0, 9)) +
  ggplot2::theme(panel.grid = ggplot2::element_blank()) +
  ggplot2::theme_bw()

Graph 3.29: Simulation to predict posterior distribution

Much like we did with samples, we can use this distribution of probabilities to predict histograms of \(w\) counts. With those in hand, we can make an analogue to the histogram in the bottom panel of Figure 3.6 (copied here into the text as Graph 3.18)

3.4 Practice

3.4.1 Easy

R Code 3.63 : Posterior distribution for the globe tossing example exercises

Code
## R code 3.27a for exercises  #######################

p_grid_3ea <- seq(from = 0, to = 1, length.out = 1000)
prior_3ea <- rep(1, 1000)
likelihood_3ea <- dbinom(6, size = 9, prob = p_grid_3ea)
posterior_3ea <- likelihood_3ea * prior_3ea
posterior_3ea <- posterior_3ea / sum(posterior_3ea)
set.seed(100)
samples_3ea <- sample(p_grid_3ea, prob = posterior_3ea, size = 1e4, replace = TRUE)
hist(samples_3ea,
     xlab = "Probability",
     ylab = "Frequency",
     main = "Posterior distribution for the globe tossing example")

Code
## R code 3.27b for exercises  #######################

df_3eb <- tibble::tibble(p_grid_3eb = base::seq(from = 0, to = 1, length.out = 1000),
               prior_3eb = base::rep(1, 1000),
               likelihood_3eb = stats::dbinom(6, size = 9, prob = p_grid_3eb),
               posterior_3eb = likelihood_3eb * prior_3eb,
               posterior_std_3eb = posterior_3eb / sum(posterior_3eb))

base::set.seed(100)
samples_3eb <- df_3eb |> 
  dplyr::slice_sample(n = 1e4, 
                      weight_by = posterior_std_3eb, replace = TRUE) |> 
  dplyr::rename(p_samples_3eb = p_grid_3eb)

samples_3eb |> 
  ggplot2::ggplot(ggplot2::aes(x = p_samples_3eb)) +
  ggplot2::geom_histogram(bins = 20, fill = "grey", color = "black") +
  ggplot2::theme_bw() +
  ggplot2::labs(x = "Propability",
                y = "Frequency",
                title = "Posterior distribution for the globe tossing example")

Original: 0.7137137, 0.3573574, 0.5985986, 0.7177177, 0.6296296, …

Tidyerse: 0.7137137, 0.3573574, 0.5985986, 0.7177177, 0.6296296, …

3.4.1.1 Exercises 3E1 - 3E7

Exercises 3E1 - 3E7

  • 3E1. How much posterior probability lies below \(p = 0.2\)?
  • 3E2. How much posterior probability lies above \(p = 0.8\)?
  • 3E3. How much posterior probability lies between \(p = 0.2\) and \(p = 0.8\)?
  • 3E4. \(20\%\) of the posterior probability lies below which value of \(p\)?
  • 3E5. \(20\%\) of the posterior probability lies above which value of \(p\)?
  • 3E6. Which values of \(p\) contain the narrowest interval equal to \(66\%\) of the posterior probability?
  • 3E7. Which values of \(p\) contain \(66\%\) of the posterior probability, assuming equal posterior probability both below and above the interval?

R Code 3.64 a: Exercises 3E1 - 3E7 (Original)

Code
## exercises 3E1a - 3E7a ####################

sum(samples_3ea < 0.2) / 1e4                            ## 3.7a
sum(samples_3ea > 0.8) / 1e4                            ## 3.7a
sum(samples_3ea > 0.2 & samples_3ea < 0.8) / 1e4        ## 3.8a
quantile(samples_3ea, 0.2)                       ## 3.9a
quantile(samples_3ea, 0.8)                       ## 3.10a
rethinking::HPDI(samples_3ea, prob = 0.66)              ## 3.13a
rethinking::PI(samples_3ea, prob = 0.66)                ## 3.12a1

## same as above
## quantile(samples_3ea, prob = c(.1667, .8333))    ## 3.12a2
#> [1] 4e-04
#> [1] 0.1116
#> [1] 0.888
#>       20% 
#> 0.5185185 
#>       80% 
#> 0.7557558 
#>     |0.66     0.66| 
#> 0.5085085 0.7737738 
#>       17%       83% 
#> 0.5025025 0.7697698

  • 3E1. 0.04% of the posterior probability lies below \(p = 0.2\).
  • 3E2. 11.16% of the posterior probability lies above \(p = 0.8\).
  • 3E3. 88.8% posterior probability lies between \(p = 0.2\) and \(p = 0.8\).
  • 3E4. \(20\%\) of the posterior probability lies below \(p =\) 0.519.
  • 3E5. \(20\%\) of the posterior probability lies above \(p =\) 0.756.
  • 3E6. \(66\%\) of the narrowest interval lies between 0.509 and 0.774.
  • 3E7. The interval between 0.503 and 0.77 contains \(66\%\) of the posterior probability, assuming equal posterior probability both below and above the interval.

R Code 3.65 b: Exercises 3E1 - 3E7 (Tidyverse)

Code
## exercises 3E1b - 3E7b ####################


(e3b1 <- samples_3eb |> 
  dplyr::filter(p_samples_3eb < 0.2) |> 
  dplyr::summarize(sum = dplyr::n() / 1e4) |>
  dplyr::pull())  
                                                        ## 3.7b
(e3b2 <- samples_3eb |> 
  dplyr::filter(p_samples_3eb > 0.8) |> 
  dplyr::summarize(sum = dplyr::n() / 1e4) |>            ## 3.7b
  dplyr::pull())

(e3b3 <- samples_3eb |> 
  dplyr::filter(p_samples_3eb > 0.2 & p_samples_3eb < 0.8) |> 
  dplyr::summarize(sum = dplyr::n() / 1e4) |>            ## 3.8b
  dplyr::pull())

(e3b4 <- stats::quantile(samples_3eb$p_samples_3eb, probs = .2))

(e3b5 <- stats::quantile(samples_3eb$p_samples_3eb, probs = .8))

(e3b6 <- tidybayes::mode_hdci(samples_3eb$p_samples_3eb, .width = .66))

## tidybayes::mean_hdci(samples_3eb$p_samples_3eb, .width = .66)

## same as above
(e3b7 <- tidybayes::qi(samples_3eb$p_samples_3eb, .width = .66))


### R Code 3.7b with alternative methods #########

#base::sum(samples_3eb$p_samples_3eb < 0.2) / 1e4           ## 3.7b
#
# samples_3eb |> 
#   dplyr::summarize(sum = mean(p_samples_3eb > .8))       ## 3.7b
# 
# samples_3eb |> 
#     dplyr::count(p_samples_3eb > .8) |> 
#     dplyr::mutate(probability = 1e4 / base::sum(1e4))   ## 3.7b
#> [1] 4e-04
#> [1] 0.1116
#> [1] 0.888
#>       20% 
#> 0.5185185 
#>       80% 
#> 0.7557558 
#>           y      ymin      ymax .width .point .interval
#> 1 0.6559935 0.5205205 0.7877878   0.66   mode      hdci
#>           [,1]      [,2]
#> [1,] 0.5025025 0.7697698

  • 3E1. 0.04% of the posterior probability lies below \(p = 0.2\).
  • 3E2. 11.16% of the posterior probability lies above \(p = 0.8\).
  • 3E3. 88.8% posterior probability lies between \(p = 0.2\) and \(p = 0.8\).
  • 3E4. \(20\%\) of the posterior probability lies below \(p =\) 0.519.
  • 3E5. \(20\%\) of the posterior probability lies above \(p =\) 0.756.
  • 3E6. \(66\%\) of the narrowest interval lies between 0.521 and 0.788.
  • 3E7. The interval between 0.503 and 0.77 contains \(66\%\) of the posterior probability, assuming equal posterior probability both below and above the interval.

Note 3.5 : Using dplyr::between()

I learned from Jake Thompson that I could have used dplyr::between() for the exercise 3E3: mean(dplyr::between(p_samples_3eb$samples_3eb, 0.2, 0.8)) = 0.888

There is a small difference between rethinking::HPDI() (0.5085085 resp. 0.7737738) and tidybayes::mode_hdci() (0.5205205 and 0.7877878).

3.4.2 Middle

3.4.2.1 3M1

Exercise 3M1: Suppose the globe tossing data had turned out to be 8 water in 15 tosses. Construct the posterior distribution, using grid approximation. Use the same flat prior as before.

R Code 3.66 : Exercise 3M1a (Original)

Code
p_grid_3m1a <- seq(from = 0, to = 1, length.out = 1000)
prior_3m1a <- rep(1, 1000)
likelihood_3m1a <- stats::dbinom(8, size = 15, prob = p_grid_3m1a)
posterior_3m1a <-  likelihood_3m1a * prior_3m1a
posterior_std_3m1a <-  posterior_3m1a / sum(posterior_3m1a) 
plot(p_grid_3m1a, posterior_std_3m1a, type = "l", 
     xlab = "Proportion of water (p)",
     ylab = "Posterior density")
grid(lty = 1)

map_3m1a <- p_grid_3m1a[which.max(posterior_std_3m1a)]

Graph 3.30: Posterior distribution for 8 ‘W’ in 15 tosses (Original)

The modus (MAP) is at p = 0.534 with a posterior density of 0.00325.

R Code 3.67 : Exercise 3M1b (Tidyverse)

Code
df_3m1b <- tibble::tibble(
  p_grid_3m1b = base::seq(from = 0, to = 1, length.out = 1000),
  prior_3m1b = base::rep(1, 1000),
  likelihood_3m1b = stats::dbinom(x = 8, size = 15, prob = p_grid_3m1b),
  posterior_3m1b =  likelihood_3m1b * prior_3m1b,
  posterior_std_3m1b = posterior_3m1b / sum(posterior_3m1b)
  ) 

df_3m1b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_grid_3m1b, y = posterior_std_3m1b)) +
  ggplot2::geom_line() +
  ggplot2::labs(x = "Proportion of water (p)",
                y = "Posterior density") +
  ggplot2::theme_bw()

map_3m1b <- df_3m1b |> 
  dplyr::arrange(dplyr::desc(posterior_std_3m1b)) |> 
  dplyr::summarize(dplyr::first(p_grid_3m1b)) |> 
  dplyr::pull(1)

Graph 3.31: Posterior distribution for 8 ‘W’ in 15 tosses (Tidyverse)

The modus (MAP) is at p = 0.5335335 with a posterior density of 0.0032519.


3.4.2.2 3M2

Exercise 3M2: Draw \(10,000\) samples from the grid approximation from above. Then use the samples to calculate the \(90\%\) and \(99\%\) HPDI for \(p\).

R Code 3.68 a: Exercise 3M2 (Original)

Code
## 1e4 samples #############
set.seed(42)
samples_3m2a <-  sample(x = p_grid_3m1a,
                       prob = posterior_std_3m1a, 
                       size = 1e4,
                       replace = TRUE)

## Density with rethinking::dens() ########
rethinking::dens(samples_3m2a, adj = 1)

## Parameters #########
(pi_3m2a_1 <- rethinking::PI(samples_3m2a, prob = .9))
(pi_3m2a_2 <- rethinking::PI(samples_3m2a, prob = .99))
(hpdi_3m2a_1 <- rethinking::HPDI(samples_3m2a, prob = .9))
(hpdi_3m2a_2 <- rethinking::HPDI(samples_3m2a, prob = .99))
(mean_3m2a <- mean(samples_3m2a))
(median_3m2a <- median(samples_3m2a))
#>        5%       95% 
#> 0.3323323 0.7207708 
#>        1%      100% 
#> 0.2382382 0.8148148 
#>      |0.9      0.9| 
#> 0.3393393 0.7267267 
#>     |0.99     0.99| 
#> 0.2312312 0.8048048 
#> [1] 0.5295147
#> [1] 0.5305305

Graph 3.32: 10,000 samples from the grid approximation 8 water with 15 tosses (Original)
  • Mean: 0.53
  • Median: 0.531
Prior = 1 Low High Diff
PI .90 0.3323323 0.7207708 0.3884384
PI .99 0.2382382 0.8148148 0.3884384
HPDI .90 0.3393393 0.7267267 0.3873874
HPDI .99 0.2312312 0.8048048 0.5735736

R Code 3.69 b: Exercise 3M2 (Tidyverse)

Code
## 1e4 samples ##########
base::set.seed(42)
samples_3m2b <- df_3m1b |> 
  dplyr::slice_sample(n = 1e4, 
                      weight_by = posterior_std_3m1b, 
                      replace = TRUE) |> 
  dplyr::rename(p_samples_3m2b = p_grid_3m1b)  

## Density with ggplot2
samples_3m2b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_samples_3m2b)) +
  ggplot2::geom_density(adjust = 1.0) + # 1.0 = default value
  ggplot2::labs(x = "N = 10000, Bandwith = 1.0",
                y = "Density") +
  ggplot2::theme_bw()


## Parameters ########
(
  param_3m2b <-  dplyr::bind_rows(
    ggdist::mean_qi(samples_3m2b$p_samples_3m2b, .width = c(0.9, 0.99)),
    ggdist::median_qi(samples_3m2b$p_samples_3m2b, .width = c(0.9, 0.99)),
    ggdist::mode_qi(samples_3m2b$p_samples_3m2b, .width = c(0.9, 0.99)),
    ggdist::mode_hdci(samples_3m2b$p_samples_3m2b, .width = c(0.9, 0.99)),
    ggdist::mode_hdi(samples_3m2b$p_samples_3m2b, .width = c(0.9, 0.99))
  )
)
#>            y      ymin      ymax .width .point .interval
#> 1  0.5295147 0.3323323 0.7207708   0.90   mean        qi
#> 2  0.5295147 0.2382382 0.8148148   0.99   mean        qi
#> 3  0.5305305 0.3323323 0.7207708   0.90 median        qi
#> 4  0.5305305 0.2382382 0.8148148   0.99 median        qi
#> 5  0.5117147 0.3323323 0.7207708   0.90   mode        qi
#> 6  0.5117147 0.2382382 0.8148148   0.99   mode        qi
#> 7  0.5117147 0.3333333 0.7217217   0.90   mode      hdci
#> 8  0.5117147 0.2312312 0.8048048   0.99   mode      hdci
#> 9  0.5117147 0.3333333 0.7217217   0.90   mode       hdi
#> 10 0.5117147 0.2312312 0.8048048   0.99   mode       hdi

Graph 3.33: 10,000 samples from the grid approximation 8 water with 15 tosses (Tidyverse)

3.4.2.3 3M3

Exercise 3M3: Construct a posterior predictive check for this model and data. This means simulate the distribution of samples, averaging over the posterior uncertainty in \(p\). What is the probability of observing 8 water in 15 tosses?

In the following set of tabs “Original” is abbreviated with “O” and Tidyverse with “T”.

R Code 3.70 a: Exercise 3M3 – Data (Original)

Code
base::set.seed(42)

## R code 3.23a #############################
dummy_3m3a <- stats::rbinom(1e4, size = 15, prob = samples_3m2a)
as.data.frame(table(dummy_3m3a))
#>    dummy_3m3a Freq
#> 1           0    3
#> 2           1   45
#> 3           2   99
#> 4           3  289
#> 5           4  540
#> 6           5  829
#> 7           6 1134
#> 8           7 1421
#> 9           8 1419
#> 10          9 1356
#> 11         10 1195
#> 12         11  817
#> 13         12  483
#> 14         13  280
#> 15         14   74
#> 16         15   16

R Code 3.71 a: Exercise 3M3 – Histogram (Original)

Code
rethinking::simplehist(dummy_3m3a, xlab = "Dummy water count")

R Code 3.72 a: Exercise 3M3 – Result (Original)

Code
(
  mean_dummy_3m3a <- mean(dummy_3m3a == 8)
)
#> [1] 0.1419

The probability of observing 8 water in 15 tosses = 14.19%.

R Code 3.73 b: Exercise 3M3 – Data (Tidyverse)

Code
base::set.seed(42)

## R code 3.23b #############################

samples_3m2b <- samples_3m2b |> 
  dplyr::mutate(dummy_3m3b = stats::rbinom(1e4, size = 15, prob = p_samples_3m2b))
  
samples_3m2b |> 
  dplyr::group_by(dummy_3m3b) |> 
  dplyr::summarize(dummy_cnt = dplyr::n())
#> # A tibble: 16 × 2
#>    dummy_3m3b dummy_cnt
#>         <int>     <int>
#>  1          0         3
#>  2          1        45
#>  3          2        99
#>  4          3       289
#>  5          4       540
#>  6          5       829
#>  7          6      1134
#>  8          7      1421
#>  9          8      1419
#> 10          9      1356
#> 11         10      1195
#> 12         11       817
#> 13         12       483
#> 14         13       280
#> 15         14        74
#> 16         15        16

R Code 3.74 b: Exercise 3M3 – Histogram (Tidyverse)

Code
samples_3m2b |> 
  ggplot2::ggplot(ggplot2::aes(x = dummy_3m3b)) +
  ggplot2::geom_histogram(bins = 16, fill = "grey", color = "black") +
  ggplot2::theme_bw() +
  ggplot2::labs(x = "Dummy water count",
                y = "Frequnecy")

Graph 3.34: Posterior predictive check for 8 water with 15 tosses

R Code 3.75 b: Exercise 3M3 – Result (Tidyverse)

Code
(
  mean_dummy_3m3b <- samples_3m2b |> 
  dplyr::summarize(mean(dummy_3m3b == 8)) |> 
  dplyr::pull()
)
#> [1] 0.1419

The probability of observing 8 water in 15 tosses = 14.19%.


3.4.2.4 3M4

Exercise 3M4: Using the posterior distribution constructed from the new (8/15) data, now calculate the probability of observing 6 water in 9 tosses.

There are two possible approaches for the solution:

  • S1: Random generation of the binomial distribution with the probability of the sampling distribution.
  • S2: Computing the density for 6/9 but with the posterior distribution from 8/15 as prior.

R Code 3.76 a: Exercise 3M4 (Original: Method S1)

Code
set.seed(100)
samples_3m4a <- rbinom(1e4, size = 9, prob = samples_3m2a)
mean(samples_3m4a == 6)
#> [1] 0.1765

R Code 3.77 a: Exercise 3M4 (Original: Method S2)

Code
#|label: exercise-3m4a2

likelihood_6of9_3m4a <-  stats::dbinom(6, size = 9, prob = p_grid_3m1a)
prior_6of9_3m4a <-  posterior_std_3m1a
(
  posterior_6of9_3m4a <- sum(likelihood_6of9_3m4a * prior_6of9_3m4a)
)
#> [1] 0.1763898

R Code 3.78 b: Exercise 3M4 (Tidyverse: Method S1)

Code
set.seed(100)
samples_3m2b |> 
  dplyr::mutate(
    p_6of9_3m4b = stats::rbinom(1e4, size = 9, prob = p_samples_3m2b)
    ) |> 
  dplyr::summarize(sum = base::mean(p_6of9_3m4b == 6)) |> 
  dplyr::pull()
#> [1] 0.1765

R Code 3.79 b: Exercise 3M4 (Tidyverse Method S2)

Code
df_3m1b |> 
  dplyr::mutate(
    likelihood_6of9_3m4b = stats::dbinom(6, size = 9, prob = p_grid_3m1b),
    prior_6of9_3m4b = posterior_std_3m1b) |> 
  dplyr::summarize(sum = base::sum(likelihood_6of9_3m4b * prior_6of9_3m4b)) |> 
  dplyr::pull()
#> [1] 0.1763898

3.4.2.5 3M5

Exercise 3M5: Start over at Exercise 3M1 (Section 3.4.2.1), but now use a prior that is zero below \(p = 0.5\) and a constant above \(p = 0.5\). This corresponds to prior information that a majority of the Earth’s surface is water. Repeat each problem above and compare the inferences. What difference does the better prior make? If it helps, compare inferences (using both priors) to the true value \(p = 0.7\).

R Code 3.80 a: Exercise 3M5: Posterior Distribution (~ 3M1: Original)

Code
## repr 3M1a ###########
p_grid_3m5a <- seq(from = 0, to = 1, length.out = 1000)
prior_3m5a <- ifelse(p_grid_3m5a < 0.5, 0, 1)
likelihood_3m5a <- stats::dbinom(8, size = 15, prob = p_grid_3m5a)
posterior_3m5a <-  likelihood_3m5a * prior_3m5a
posterior_std_3m5a <-  posterior_3m5a / sum(posterior_3m5a) 
plot(p_grid_3m5a, posterior_std_3m5a, type = "l", lty = "dashed",
     xlab = "Proportion of water (p)",
     ylab = "Posterior density")
lines(p_grid_3m1a, posterior_std_3m1a)
abline(v = .7, col = "red")
grid(lty = 1)

Graph 3.35: Posterior distribution for 8 ‘W’ in 15 tosses with prior < 0.5 = 0 otherwise prior = 1 (Original)

Compared with the true value of \(0.7\) the new prior (\(0\) under \(p = 0.5\), otherwise \(p = 1\)) is better, but there is still a big difference between the true value and the mode of the probability density.

R Code 3.81 a: Exercise 3M5.2: HPDI of 90% from 1e4 samples (~3M2: Original)

Code
## repr 3M2a #############
set.seed(42)
samples_3m5a <- sample(x = p_grid_3m5a, 
                      size = 1e4, 
                      replace = TRUE, 
                      prob = posterior_std_3m5a
                      )
hpdi_3m5a <- rethinking::HPDI(samples_3m5a, .9)

max_3m2a <- max(posterior_std_3m1a)
max_3m5a <- max(posterior_std_3m5a)
Prior Low High Diff Max Density
1 0.3393393 0.7267267 0.3873874 0.0032519
0/1 0.5005005 0.7117117 0.2112112 0.0054363

The HPDI interval is smaller with the the new prior \(p < .5 = 0 \operatorname{and} p >= .5 = 1\) than with a constant prior of \(1\). The new prior is better as it gives more precise information.

The maximum posterior density of the new prior \(0/1\) is higher than the maximum with the prior of \(1\). Values of \(p < 0.5\) are no longer taking up posterior density because they are \(0\). This can also be seen in Graph 3.35.

R Code 3.82 a: Exercise 3M5.3: Posterior predictive check (~3M3: Original)

Code
## repr 3M3a ##############
set.seed(42)
dummy_3m5a_1 <- rbinom(1e4, 15, samples_3m5a)
mean_3m5a_1 <- mean(dummy_3m5a_1 == 8)

## repr 3M4a ##############
set.seed(42)
dummy_3m5a_2 <- rbinom(1e4, 9, samples_3m5a)
mean_3m5a_2 <- mean(dummy_3m5a_2 == 6)

## compare with p = 0.7 ##############
set.seed(42)
dummy_3m5a_3 <- rbinom(1e4, 15, 0.7)
mean_3m5a_3 <- mean(dummy_3m5a_3 == 8)

set.seed(42)
dummy_3m5a_4 <- rbinom(1e4, 9, 0.7)
mean_3m5a_4 <- mean(dummy_3m5a_4 == 6)
Name Mean of 15/08 Mean of 09/06
Samples 0.1516 0.2401
Value 0.7 0.0822 0.265

The mean of the samples of 6 water observations with 9 tosses are higher than 8 water with 15 tosses. This represents the fact that the real value of 0.7 is nearer with 6/9 = 0.667 than with 8/15 = 0.533. Also note that a predictive check with the real value of \(0.7\) shows a very small mean value with 15/8 but a much higher mean with 9/6.

R Code 3.83 b: Exercise 3M5: Posterior Distribution (~ 3M1: Tidyverse)

Code
## repr 3M1b ##############
df_3m5b <- tibble::tibble(
  p_grid_3m5b = base::seq(from = 0, to = 1, length.out = 1000),
  prior_3m5b = base::ifelse(p_grid_3m5b < 0.5, 0, 1),
  likelihood_3m5b = stats::dbinom(x = 8, size = 15, prob = p_grid_3m5b),
  posterior_3m5b =  likelihood_3m5b * prior_3m5b,
  posterior_std_3m5b = posterior_3m5b / sum(posterior_3m5b)
  )

ggplot2::ggplot() +
ggplot2::geom_line(data = df_3m5b,
                mapping = ggplot2::aes(x = p_grid_3m5b, y = posterior_std_3m5b,
                                       linetype = "ifelse(p < 0.5, 0, 1)")) +
ggplot2::geom_line(data = df_3m1b,
                mapping = ggplot2::aes(x = p_grid_3m1b, y = posterior_std_3m1b,
                                       linetype = "1")) +
ggplot2::scale_linetype_manual("Prior:", 
                        values = c("solid", "dashed")) +
ggplot2::geom_vline(xintercept = .7, color = "red") +
ggplot2::labs(x = "Proportion of water (p)",
              y = "Posterior density") +
ggplot2::theme_bw()

Graph 3.36: Posterior distribution for 8 ‘W’ in 15 tosses with prior < 0.5 = 0 otherwise prior = 1 (Tidyverse)

Compared with the true value of \(0.7\) the new prior (\(0\) under \(p = 0.5\), otherwise \(p = 1\)) is better, but there is still a big difference between the true value and the mode of the probability density.

R Code 3.84 b: Exercise 3M5.2: HPDI of 90% from 1e4 samples (~3M2: Tidyverse)

Code
## repr 3M2b ##############
set.seed(42)
samples_3m5b <- df_3m5b |> 
  dplyr::slice_sample(n = 1e4,
                      weight_by = posterior_std_3m5b,
                      replace = TRUE) |> 
  dplyr::rename(p_samples_3m5b = p_grid_3m5b)


hpdi_3m2b <- rethinking::HPDI(samples_3m2b$p_samples_3m2b, .9)
hpdi_3m5b <- rethinking::HPDI(samples_3m5b$p_samples_3m5b, .9)

max_3m2b <- max(df_3m1b$posterior_std_3m1b)
max_3m5b <- max(df_3m5b$posterior_std_3m5b)
Prior Low High Diff Max Density
1 0.3393393 0.7267267 0.3873874 0.0032519
0/1 0.5005005 0.7117117 0.2112112 0.0054363

The HPDI interval is smaller with the the new prior \(p < .5 = 0 \operatorname{and} p >= .5 = 1\) than with a constant prior of \(1\). The new prior is better as it gives more precise information.

The maximum posterior density of the new prior \(0/1\) is higher than the maximum with the prior of \(1\). Values of \(p < 0.5\) are no longer taking up posterior density because they are \(0\). This can also be seen in Graph 3.36.

R Code 3.85 b: Exercise 3M5.3: Posterior predictive check (~3M3: Tidyverse)

Code
## repr 3M3b ##############
set.seed(42)
dummy_3m5b_1 <- rbinom(1e4, 15, samples_3m5b$p_samples_3m5b)
mean_3m5b_1 <- mean(dummy_3m5b_1 == 8)

## repr 3M4b ##############
set.seed(42)
dummy_3m5b_2 <- rbinom(1e4, 9, samples_3m5b$p_samples_3m5b)
mean_3m5b_2 <- mean(dummy_3m5b_2 == 6)

## compare with p = 0.7 ##############
set.seed(42)
dummy_3m5b_3 <- rbinom(1e4, 15, 0.7)
mean_3m5b_3 <- mean(dummy_3m5b_3 == 8)

set.seed(42)
dummy_3m5b_4 <- rbinom(1e4, 9, 0.7)
mean_3m5b_4 <- mean(dummy_3m5b_4 == 6)
Name Mean of 15/08 Mean of 09/06
Samples 0.1516 0.2401
Value 0.7 0.0822 0.265

The mean of the samples of 6 water observations with 9 tosses are higher than 8 water with 15 tosses. This represents the fact that the real value of 0.7 is nearer with 6/9 = 0.667 than with 8/15 = 0.533. Also note that a predictive check with the real value of \(0.7\) shows a very small mean value with 15/8 but a much higher mean with 9/6.


3.4.2.6 3M6

Exercise 3M6: Suppose you want to estimate the Earth’s proportion of water very precisely. Specifically, you want the 99% percentile interval of the posterior distribution of \(p\) to be only \(0.05\) wide. This means the distance between the upper and lower bound of the interval should be \(0.05\).

  • N tosses: How many times will you have to toss the globe to do this?
  • Width: How influences the sample size the difference between samples?

R Code 3.86 a: Exercise 3M6-1: How many times will you have to toss the globe to get a \(0.05\) wide interval with \(PI = .99\)? (Original)

Although McElreath states explicitly that he won’t require an exact answer but rather wants to see the chosen approach (Winter 2019 Homework) I was curious about the figure and how to program this process. It turned out that the chosen set.seed() number has effects how many times one has to toss the globe.

Code
pi99_3m6a <- function(N, seed) {
  set.seed(seed)
  p_true <-  0.7
  W <- rbinom(1, size = N, prob = p_true)
  p_grid_3m6a <- seq(from = 0, to = 1, length.out = 1000)
  prior_3m6a <- rep(1, 1000)
  likelihood_3m6a <- dbinom(W, size = N, prob = p_grid_3m6a)
  posterior_3m6a <- likelihood_3m6a * prior_3m6a
  posterior_std_3m6a <-  posterior_3m6a / sum(posterior_3m6a)
  
  samples_3m6a <-
    sample(p_grid_3m6a,
           prob = posterior_std_3m6a,
           size = 1e4,
           replace = TRUE)
  rethinking::PI(samples_3m6a, 0.99)
}


tictoc::tic()
N = 1
my_seed = c(3, 42, 100, 150)
s = ''

for (i in 1:length(my_seed)) {
  PI99 <- pi99_3m6a(N, my_seed[i])
  while (as.numeric(PI99[2] - PI99[1] > 0.05)) {
    N <- N + 1
    PI99 <- pi99_3m6a(N, my_seed[i])
  }
  s[i] <- glue::glue('With set.seed({my_seed[i]}) we need to toss the globe {N} times.\n')
  N = 1
}
s
tictoc::toc()
#> [1] "With set.seed(3) we need to toss the globe 2102 times."  
#> [2] "With set.seed(42) we need to toss the globe 2208 times." 
#> [3] "With set.seed(100) we need to toss the globe 2304 times."
#> [4] "With set.seed(150) we need to toss the globe 2115 times."
#> 8.309 sec elapsed

R Code 3.87 a: Exercise 3M6-2: Demonstrate the variation of the interval width with different number of simulations (Original)

Code
f_3m6a <- function(N) {
  p_true <-  0.7
  W <- rbinom(1, size = N, prob = p_true)
  p_grid_3m6a <- seq(from = 0, to = 1, length.out = 1000)
  prior_3m6a <- rep(1, 1000)
  likelihood_3m6a <- dbinom(W, size = N, prob = p_grid_3m6a)
  posterior_3m6a <- likelihood_3m6a * prior_3m6a
  posterior_std_3m6a <-  posterior_3m6a / sum(posterior_3m6a)
  
  samples_3m6a <-
    sample(p_grid_3m6a,
           prob = posterior_std_3m6a,
           size = 1e4,
           replace = TRUE)
  pi_99_3m6a <-  rethinking::PI(samples_3m6a, 0.99)
  as.numeric(pi_99_3m6a[2] - pi_99_3m6a[1])
}


Nlist <- c(20, 50, 100, 200, 500, 1000, 2000) 
Nlist <- rep(Nlist, each = 100) 
width <- sapply(Nlist, f_3m6a) 
plot(Nlist, width) 
abline(h = 0.05, col = "red")

The difference between samples shrinks as the sample size increases.

What are we looking at in this plot? The horizontal is sample size. The points are individual interval widths, one for each simulation. The red line is drawn at a width of \(0.05\). Looks like we need more than \(2000\) tosses of the globe to get the interval to be that precise.

The above is a general feature of learning from data: The greatest returns on learning come early on. Each additional observation contributes less and less. So it takes very much effort to progressively reduce our uncertainty. So if your application requires a very precise estimate, be prepared to collect a lot of data. Or to change your approach. (Solution, Week-01, Winter 2019)

R Code 3.88 b: Exercise 3M6-1: How many times will you have to toss the globe to get a \(0.05\) wide interval with \(PI = .99\)? (Tidyverse)

Because of the slow performance of manipulating whole data frame, I am going to use here just one set.seed() function. This is only to show that we get the same result as in the first loop of R Code 3.86.

Code
pi99_3m6b <- function(N, seed) {
  base::set.seed(seed)
  p_true <-  0.7
  W <- stats::rbinom(1, size = N, prob = p_true)
  df_3m6b <-  tibble::tibble(p_grid_3m6b = 
                            base::seq(from = 0, to = 1, length.out = 1000)) |> 
    dplyr::mutate(prior_3m6b = base::rep(1, 1000),
           likelihood_3m6b = stats::dbinom(W, size = N, prob = p_grid_3m6b),
           posterior_3m6b = likelihood_3m6b * prior_3m6b,
           posterior_std_3m6b = posterior_3m6b / sum(posterior_3m6b))
  
  samples_3m6b <- df_3m6b |> 
    dplyr::slice_sample(n = 1e4,
                        weight_by = posterior_std_3m6b,
                        replace = TRUE)
  
  rethinking::PI(samples_3m6b$p_grid_3m6b, 0.99)
}

tictoc::tic()
N = 1
my_seed = 3 ## just one loop!
s = ''

for (i in 1:length(my_seed)) {
  PI99_3m6b <- pi99_3m6b(N, my_seed[i])
  while (as.numeric(PI99_3m6b[2] - PI99_3m6b[1] > 0.05)) {
    N <- N + 1
    PI99_3m6b <- pi99_3m6b(N, my_seed[i])
  }
  s[i] <- glue::glue('With set.seed({my_seed[i]}) we need to toss the globe {N} times.\n')
  N = 1
}
s
tictoc::toc()
#> [1] "With set.seed(3) we need to toss the globe 2102 times."
#> 35.169 sec elapsed

Compare the execution time of just one set.seed() calculation with four in R Code 3.86. Using data frames turns out about 16(!) times slower the computation with vectors. (Already just one set.seed() computation with tibbles is 4 times slower than the processing time of 4 set.seed() loops with base R vectors.)

R Code 3.89 b: Exercise 3M6-2: Demonstrate the variation of the interval width with different number of simulations (Tidyverse)

Code
f_3m6b <- function(N) {
  p_true <-  0.7
  W <- stats::rbinom(1, size = N, prob = p_true)
  df_3m6b <-  tibble::tibble(p_grid_3m6b = 
                            base::seq(from = 0, to = 1, length.out = 1000)) |> 
    dplyr::mutate(prior_3m6b = base::rep(1, 1000),
           likelihood_3m6b = stats::dbinom(W, size = N, prob = p_grid_3m6b),
           posterior_3m6b = likelihood_3m6b * prior_3m6b,
           posterior_std_3m6b = posterior_3m6b / sum(posterior_3m6b))
  
  samples_3m6b <- df_3m6b |> 
    dplyr::slice_sample(n = 1e4,
                        weight_by = posterior_std_3m6b,
                        replace = TRUE)
  pi_99_3m6b <-  rethinking::PI(samples_3m6b$p_grid_3m6b, 0.99)
  as.numeric(pi_99_3m6b[2] - pi_99_3m6b[1])
}


Nlist_3m6b <- c(20, 50, 100, 200, 500, 1000, 2000) 


df_3m6b_2 <- 
  tibble::tibble(Nlist_3m6b = base::rep(Nlist_3m6b, each = 100)) |> 
  dplyr::mutate(width_3m6b = purrr::map_dbl(Nlist_3m6b, f_3m6b))

df_3m6b_2 |> 
  ggplot2::ggplot(ggplot2::aes(x = Nlist_3m6b,
                             y = width_3m6b)) +
  ggplot2::geom_point(alpha = 1/10) +
  ggplot2::labs(x = "Number of samples",
                y = "Width of PI = 0.99") +
  ggplot2::geom_hline(yintercept = .05, color = "red") +
  ggplot2::theme_bw()

The difference between samples shrinks as the sample size increases.

What are we looking at in this plot? The horizontal is sample size. The points are individual interval widths, one for each simulation. The red line is drawn at a width of \(0.05\). Looks like we need more than \(2000\) tosses of the globe to get the interval to be that precise.

The above is a general feature of learning from data: The greatest returns on learning come early on. Each additional observation contributes less and less. So it takes very much effort to progressively reduce our uncertainty. So if your application requires a very precise estimate, be prepared to collect a lot of data. Or to change your approach. (Solution, Week-01, Winter 2019)


3.4.3 Hard

R Code 3.90 : Reported first and second born children in 100 two-child families

Code
## R code 3.28 ###############
birth1 <- c(
  1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,
  1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,
  0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0,
  1, 0, 1, 1, 1, 0, 1, 1, 1, 1
)

birth2 <- c(
  0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0,
  1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1,
  0, 0, 0, 1, 1, 1, 0, 0, 0, 0
)


## R code 3.30 ###############
n_boys <- sum(birth1) + sum(birth2)

200 Births of these are boys:

  • 1st born: 51
  • 2nd born: 60,
  • total: 111

3.4.3.1 3H1

Exercise 3H1: Using grid approximation, compute the posterior distribution for the probability of a birth being a boy. Assume a uniform prior probability. Which parameter value maximizes the posterior probability?

R Code 3.91 a: Posterior distribution for the probability of a birth being a boy (Original)

Code
p_grid_3h1a <- seq(from = 0,
                   to = 1,
                   length.out = 1e3)
prior_3h1a <- rep(0.5, 1e3)
likelihood_3h1a <- dbinom(x = sum(birth1) + sum(birth2),
                          size = 200,
                          prob = p_grid_3h1a)
posterior_3h1a <-  prior_3h1a * likelihood_3h1a
posterior_std_3h1a <- posterior_3h1a / sum(posterior_3h1a)
plot(p_grid_3h1a, posterior_std_3h1a, type = "l")
map_3h1a <- p_grid_3h1a[which.max(posterior_std_3h1a)]

Graph 3.37: Posterior distribution of boys rate probability (Original)

The mode or MAP is at 0.555, e.g. the birth ratio for boys is 55.5%.

R Code 3.92 b: Posterior distribution for the probability of a birth being a boy (Tidyverse)

Code
df_3h1b <- tibble::tibble(p_grid_3h1b = base::seq(
                   from = 0,
                   to = 1,
                   length.out = 1e3)) |> 
    dplyr::mutate(prior_3h1b = base::rep(0.5, 1e3),
           likelihood_3h1b = stats::dbinom(
                          x = base::sum(birth1) + base::sum(birth2),
                          size = 200,
                          prob = p_grid_3h1b),
           posterior_3h1b = prior_3h1b * likelihood_3h1b,
           posterior_std_3h1b = posterior_3h1b / sum(posterior_3h1b))

map_3h1b <- df_3h1b |> 
  dplyr::arrange(dplyr::desc(posterior_std_3h1b)) |> 
  dplyr::summarize(dplyr::first(p_grid_3h1b)) |> 
  dplyr::pull(1)

df_3h1b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_grid_3h1b,
                               y = posterior_std_3h1b)) +
  ggplot2::geom_line() +
  ggplot2::geom_vline(xintercept = map_3h1b, color = "red") +
  ggplot2::labs(x = "Ratio of boy births",
                y = "Posterior distribution") +
  ggplot2::theme_bw()

Graph 3.38: Posterior distribution of boys rate probability (Tidyverse)

The mode or MAP is at 0.555, e.g. the birth ratio for boys is 55.5%.

Note: {tidybayes} point intervals needs samples

I computed the mode by arranging the data frame with the highest value at the top. This is exactly the same value as in the book version. Another Base R solution would be df_3h1b$p_grid_3h1b[which.max(df_3h1b$posterior_std_3h1b)] = 55.4554555%.

I tried to calculate the mode also with the {tidybayes} package. But all mode values from this package (mode_qi(), mode_hdci(), mode_hdi() and Mode()) result in smaller values, for instance tidybayes::Mode(samples_3m2b$p_samples_3m2b) = 51.2%.

After many trials and exerpiments I came to the conclusion that above mentioned functions of the {tidybayes} package work only with generated samples. Therefore I do not have to address two columns as with the which.max() (one outside and one inside the square brackets) and I have to supply vectors with the above mentioned family of function form the {tidybayes} package.

Furthermore I noticed that all point_intervals without mode_, median_ and mean_ in front of the qi, ll, ul, hdi, hdci is not in {tidybayes} but in the accompanying package {ggdist} and need therefore as prefix ggdist::. Furthermore these functions needs numerical vectors and can’t therfore used with the pipe.


3.4.3.2 3H2

Exercise 3H2: Using the sample() function, draw \(10,000\) random parameter values from the posterior distribution you calculated above. Use these samples to estimate the \(50\%\), \(89\%\), and \(97\%\) highest posterior density intervals.

R Code 3.93 a: Draw \(10,000\) random parameter values from the posterior distribution you calculated above (Original)

Code
set.seed(100)
samples_3h2a <- sample(x = p_grid_3h1a,
                       size = 1e4,
                       replace = TRUE,
                       prob = posterior_std_3h1a)
rethinking::HPDI(samples_3h2a, c(.5, .89, .97))

map_3h2a <- rethinking::chainmode(samples_3h2a, adj = 1)

rethinking::dens(samples_3h2a, adj = 1)
abline(v = map_3h2a, col = "red")
#>     |0.97     |0.89      |0.5      0.5|     0.89|     0.97| 
#> 0.4824825 0.4994995 0.5265265 0.5725726 0.6076076 0.6296296

Graph 3.39: 10.000 samples from the posterior distribution of the birth data (Original)

The red line is the density mode (0.555) calculated with rethinking::chainmode().

R Code 3.94 b: Draw \(10,000\) random parameter values from the posterior distribution you calculated above (Tidyverse)

Code
set.seed(100)
samples_3h2b <-  df_3h1b |> 
  dplyr::slice_sample(n = 1e4,
                      replace = TRUE,
                      weight_by = posterior_std_3h1b)
(
  map_3h2b <- samples_3h2b |> 
    tidybayes::mode_hdi(p_grid_3h1b, .width = c(.5, .89, .97)) 
)

samples_3h2b |> 
  ggplot2::ggplot(ggplot2::aes(x = p_grid_3h1b)) +
  ggplot2::geom_density() +
  ggplot2::geom_vline(xintercept = map_3h2b[[1, 1]], color = "red") +
  ggplot2::labs(x = "Ratio of boy births",
                y = "Samples from the posterior distribution") +
  ggplot2::theme_bw()
#> # A tibble: 3 × 6
#>   p_grid_3h1b .lower .upper .width .point .interval
#>         <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
#> 1       0.555  0.531  0.577   0.5  mode   hdi      
#> 2       0.555  0.504  0.613   0.89 mode   hdi      
#> 3       0.555  0.483  0.632   0.97 mode   hdi

Graph 3.40: 10.000 samples from the posterior distribution of the birth data (Tidyverse)

The red line is the density mode (0.555) calculated with tidybayes::mode_hdi() and also displayed in the column p_grid_3h1b.


3.4.3.3 3H3

Exercise 3H3: Use rbinom() to simulate \(10,000\) replicates of \(200\) births. You should end up with \(10,000\) numbers, each one a count of boys out of \(200\) births. Compare the distribution of predicted numbers of boys to the actual count in the data (\(111\) boys out of \(200\) births). There are many good ways to visualize the simulations, but the dens() command (part of the {rethinking} package) is probably the easiest way in this case. Does it look like the model fits the data well? That is, does the distribution of predictions include the actual observation as a central, likely outcome?

R Code 3.95 a: Posterior predictive distribution with 10,000 replicates of 200 births (Original)

Code
set.seed(100)
dummy_data_3h3a <-  rbinom(n = 1e4,
                           size = 200,
                           prob = samples_3h2a)
head(dummy_data_3h3a, 10)

rethinking::dens(dummy_data_3h3a, 
                 adj = 1)
abline(v = sum(birth1 + birth2), col = "red")
#>  [1] 105 124  98 102  94  96  98 107 101 108

Graph 3.41: Create posterior predictive distribution with 10,000 replicates of 200 births (Original)

The model appears to fit well, because the observed value of 111 boys is in the middle of the posterior predictive distribution which has a mean of 110.7847 and a median of 111.

R Code 3.96 b: Posterior predictive distribution with 10,000 replicates of 200 births (Tidyverse)

Code
set.seed(100)
dummy_data_3h3b <-  stats::rbinom(n = 1e4,
                           size = 200,
                           prob = samples_3h2b$p_grid_3h1b
                           )
utils::head(dummy_data_3h3b, 10)

ggplot2::ggplot() +
  ggdist::stat_slabinterval(ggplot2::aes(x = dummy_data_3h3b),
                            .width = c(0.66, 0.89), color = "red",
                            slab_fill = "grey", slab_color = "black",
                            density = "bounded") +
  ggplot2::geom_vline(
    ggplot2::aes(xintercept = n_boys), linetype = "dashed", color = "red") +
  ggplot2::labs(x = "Number of Boys", y = "Density") +
  ggplot2::theme_bw()
#>  [1] 105 124  98 102  94  96  98 107 101 108

Graph 3.42: Create posterior predictive distribution with 10,000 replicates of 200 births (Tidyverse)

I have used the graphic function from the {ggdist} resp. {tidybayes} package. I learned about this familiy of functions from the solution by Jake Thompson.

The model appears to fit well, because the observed value of 111 boys is in the middle of the posterior predictive distribution which has a mean of 110.7847 and a median of 111.


3.4.3.4 3H4

Exercise 3H4: Now compare \(10,000\) counts of boys from \(100\) simulated first borns only to the number of boys in the first births, birth1. How does the model look in this light?

R Code 3.97 a: Compare \(10,000\) counts of boys from \(100\) simulated first borns (Original)

Code
set.seed(100)
dummy_data_3h4a <-  rbinom(n = 1e4,
                           size = 100,
                           prob = samples_3h2a)

utils::head(dummy_data_3h4a, 10)
map_3h4a <- rethinking::chainmode(dummy_data_3h4a, adj = 1)

rethinking::dens(dummy_data_3h4a, adj = 0.7)
abline(v = sum(birth1), col = "blue")
abline(v = map_3h4a, col = "red")
#>  [1] 51 60 59 48 56 49 50 52 46 62

Graph 3.43: Compare 10,000 counts of boys from 100 simulated first borns (Original)

Observed first born boys: 51

Simulation of first born boys:

  • Mean: 55.3639
  • Median: 55
  • Mode: 54.961

With the posterior predictive distribution for the first born boys the model does not appears to fit well, because the observed value of 51 boys (blue line) is not in the middle of the distribution.

R Code 3.98 b: Compare \(10,000\) counts of boys from \(100\) simulated first borns (Tidyverse)

Code
set.seed(100)
dummy_data_3h4b <-  stats::rbinom(n = 1e4,
                           size = 100,
                           prob = samples_3h2b$p_grid_3h1b
                           )
utils::head(dummy_data_3h4b, 10)

map_3h4b <- tidybayes::Mode(dummy_data_3h4b)

ggplot2::ggplot() +
  ggdist::stat_slabinterval(ggplot2::aes(x = dummy_data_3h4b),
                            .width = c(0.66, 0.89), color = "red",
                            slab_fill = "grey", slab_color = "black",
                            density = "bounded") +
  ggplot2::geom_vline(
    ggplot2::aes(xintercept = sum(birth1)), linetype = "dashed", color = "blue") +
  ggplot2::geom_vline(
    ggplot2::aes(xintercept = map_3h4b), linetype = "dashed", color = "red") +
  ggplot2::labs(x = "Number of 1st born boys", y = "Density") +
  ggplot2::theme_bw()
#>  [1] 51 60 59 48 56 49 50 52 46 62

Graph 3.44: Compare 10,000 counts of boys from 100 simulated first borns (Tidyverse)

Observed first born boys: 51

Simulation of first born boys:

  • Mean: 55.3639
  • Median: 55
  • Mode: 55

With the posterior predictive distribution for the first born boys the model does not appears so well as for all boys in R Code 3.96, because the observed value of 51 boys (blue line) is not in the middle of the distribution. However, it does not appear to be a large discrepancy, as the observed value is still within the middle 66% interval.


3.4.3.5 3H5

Exercise 3H5: The model assumes that sex of first and second births are independent. To check this assumption, focus now on second births that followed female first borns. Compare 10,000 simulated counts of boys to only those second births that followed girls. To do this correctly, you need to count the number of first borns who were girls and simulate that many births, 10,000 times. Compare the counts of boys in your simulations to the actual observed count of boys following girls. How does the model look in this light? Any guesses what is going on in these data?

Code
df_birth_3h5a <- as.data.frame(cbind(birth1, birth2))
df_birth_3h5a <- subset(df_birth_3h5a, birth1 == 0)

n_1st_girls_3h5a <- 100 - sum(birth1)
n_2nd_boys_3h5a <- sum(df_birth_3h5a$birth1) + sum(df_birth_3h5a$birth2)

p_grid_3h5a <- seq(from = 0,
                   to = 1,
                   length.out = 1e3)
prior_3h5a <- rep(0.5, 1e3)
likelihood_3h5a <- dbinom(x = n_2nd_boys_3h5a,
                              size = n_1st_girls_3h5a,
                              prob = p_grid_3h5a)
posterior_3h5a <-  prior_3h5a * likelihood_3h5a
posterior_std_3h5a <- posterior_3h5a / sum(posterior_3h5a)

set.seed(100)
dummy_data_3h5a <-  rbinom(n = 1e4,
                           size = n_1st_girls_3h5a,
                           prob = samples_3h2a)
rethinking::dens(dummy_data_3h5a, 
                 adj = 1,
                 xlab = "Number of 2nd born boys that follow a 1st born girl")
abline(v = n_2nd_boys_3h5a, col = "blue")

Graph 3.45: Second births of boys that followed female first borns (Original)

The model predicts 27 2nd born boys that follow a 1st born girl. But we have observed 39 boys. Our model underestimates the number of 2nd born boys so that our assumption that births are independent must be challenged.

Code
n_1st_girls_3h5b <- 100 - sum(birth1)
n_2nd_boys_3h5b <- sum(birth2[which(birth1 == 0)])


set.seed(100)
dummy_data_3h5b <- rbinom(1e4, size = n_1st_girls_3h5b, 
            prob = samples_3h2b$p_grid_3h1b)

map_3h5b <-  tidybayes::Mode(dummy_data_3h5b)

ggplot2::ggplot() +
  ggdist::stat_slabinterval(ggplot2::aes(x = dummy_data_3h5b),
                            .width = c(0.66, 0.89), color = "red",
                            slab_fill = "grey", slab_color = "black",
                            density = "bounded") +
  ggplot2::geom_vline(
    ggplot2::aes(xintercept = n_2nd_boys_3h5b), linetype = "dashed", color = "blue") +
  ggplot2::geom_vline(
    ggplot2::aes(xintercept = map_3h5b), linetype = "dashed", color = "red") +
  ggplot2::labs(x = "Number of 2nd born boys that follow a 1st born girl", 
       y = "Density") +
  ggplot2::theme_bw()

Graph 3.46: Second births of boys that followed female first borns (Tidyverse)

The model predicts 27 2nd born boys that follow a 1st born girl. But we have observed 39 boys. Our model underestimates the number of 2nd born boys so that our assumption that births are independent appears to be violated. We have generally more 2nd born boys (60% versus 51%) but especially more 2nd born boys when the first born was a girl (79.59%).

3.5 Session info

Code
utils::sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.1.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Vienna
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] brms_2.20.4         Rcpp_1.0.11         patchwork_1.1.3    
#>  [4] lubridate_1.9.3     forcats_1.0.0       stringr_1.5.1      
#>  [7] dplyr_1.1.4         purrr_1.0.2         readr_2.1.4        
#> [10] tidyr_1.3.0         tibble_3.2.1        ggplot2_3.4.4      
#> [13] tidyverse_2.0.0     glossary_1.0.0.9000
#> 
#> loaded via a namespace (and not attached):
#>   [1] svUnit_1.0.6         shinythemes_1.2.0    splines_4.3.2       
#>   [4] later_1.3.1          cellranger_1.1.0     xts_0.13.1          
#>   [7] rpart_4.1.21         lifecycle_1.0.4      StanHeaders_2.26.28 
#>  [10] processx_3.8.2       lattice_0.22-5       MASS_7.3-60         
#>  [13] insight_0.19.6       crosstalk_1.2.0      ggdist_3.3.0        
#>  [16] backports_1.4.1      magrittr_2.0.3       rmarkdown_2.25      
#>  [19] yaml_2.3.7           httpuv_1.6.12        skimr_2.1.5         
#>  [22] pkgbuild_1.4.2       gld_2.6.6            multcomp_1.4-25     
#>  [25] abind_1.4-5          expm_0.999-7         tidybayes_3.0.6     
#>  [28] TH.data_1.1-2        tensorA_0.36.2       sandwich_3.0-2      
#>  [31] rmutil_1.1.10        inline_0.3.19        spatial_7.3-17      
#>  [34] bridgesampling_1.1-2 commonmark_1.9.0     codetools_0.2-19    
#>  [37] DT_0.30              xml2_1.3.5           tidyselect_1.2.0    
#>  [40] shape_1.4.6          bayesplot_1.10.0     farver_2.1.1        
#>  [43] stable_1.1.6         matrixStats_1.1.0    stats4_4.3.2        
#>  [46] base64enc_0.1-3      jsonlite_1.8.7       e1071_1.7-13        
#>  [49] ellipsis_0.3.2       survival_3.5-7       emmeans_1.8.9       
#>  [52] tools_4.3.2          DescTools_0.99.50    glue_1.6.2          
#>  [55] gridExtra_2.3        xfun_0.41            cmdstanr_0.5.3      
#>  [58] distributional_0.3.2 loo_2.6.0            withr_2.5.2         
#>  [61] timeSeries_4031.107  fastmap_1.1.1        boot_1.3-28.1       
#>  [64] fansi_1.0.5          shinyjs_2.1.0        callr_3.7.3         
#>  [67] digest_0.6.33        timechange_0.2.0     R6_2.5.1            
#>  [70] mime_0.12            estimability_1.4.1   colorspace_2.1-0    
#>  [73] gtools_3.9.4         markdown_1.11        threejs_0.3.3       
#>  [76] modeest_2.4.0        utf8_1.2.4           generics_0.1.3      
#>  [79] data.table_1.14.8    class_7.3-22         httr_1.4.7          
#>  [82] prettyunits_1.2.0    htmlwidgets_1.6.2    pkgconfig_2.0.3     
#>  [85] dygraphs_1.1.1.6     gtable_0.3.4         Exact_3.2           
#>  [88] timeDate_4022.108    htmltools_0.5.7      clue_0.3-65         
#>  [91] scales_1.2.1         rethinking_2.40      lmom_3.0            
#>  [94] posterior_1.5.0      knitr_1.45           rstudioapi_0.15.0   
#>  [97] tzdb_0.4.0           reshape2_1.4.4       statip_0.2.3        
#> [100] coda_0.19-4          checkmate_2.3.0      nlme_3.1-163        
#> [103] curl_5.1.0           repr_1.1.6           proxy_0.4-27        
#> [106] zoo_1.8-12           rootSolve_1.8.2.4    parallel_4.3.2      
#> [109] miniUI_0.1.1.1       fBasics_4032.96      pillar_1.9.0        
#> [112] grid_4.3.2           vctrs_0.6.4          shinystan_2.6.0     
#> [115] promises_1.2.1       arrayhelpers_1.1-0   xtable_1.8-4        
#> [118] cluster_2.1.4        evaluate_0.23        mvtnorm_1.2-3       
#> [121] cli_3.6.1            compiler_4.3.2       rlang_1.1.2         
#> [124] crayon_1.5.2         rstantools_2.3.1.1   rversions_2.1.2     
#> [127] labeling_0.4.3       ps_1.7.5             plyr_1.8.9          
#> [130] stringi_1.8.1        rstan_2.32.3         viridisLite_0.4.2   
#> [133] QuickJSR_1.0.7       munsell_0.5.0        colourpicker_1.3.0  
#> [136] bayestestR_0.13.1    Brobdingnag_1.2-9    V8_4.4.0            
#> [139] Matrix_1.6-3         hms_1.1.3            stabledist_0.7-1    
#> [142] shiny_1.8.0          igraph_1.5.1         RcppParallel_5.1.7  
#> [145] readxl_1.4.3