10  Introduction to Prediction

Example 10.1 Do people prefer to use the word “data” as singular or plural? FiveThirtyEight conducted a poll to address this question (and others). Rather than simply ask whether the respondent considered “data” to be singular or plural, they asked which of the following sentences they prefer:

  1. Some experts say it’s important to drink milk, but the data is inconclusive.
  2. Some experts say it’s important to drink milk, but the data are inconclusive.

Suppose we wish to study the opinions of students in Cal Poly statistics classes regarding this issue. That is, let \(\theta\) represent the population proportion of students in Cal Poly statistics classes who prefer to consider data as a singular noun, as in option a) above (“the data is”).

Before proceeding, consider the context of this problem and sketch your prior distribution for \(\theta\). What are the main features of your prior?

To illustrate ideas, we’ll start with a prior distribution which places probability 0.01, 0.05, 0.15, 0.30, 0.49 on the values 0.1, 0.3, 0.5, 0.7, 0.9, respectively.

  1. Before observing any data, suppose we plan to randomly select a single Cal Poly statistics student. Consider the unconditional prior probability that the selected student prefers data as singular. (This is called a prior predictive probability.) Explain how you could use simulation to approximate this probability.




  2. Compute the prior predictive probability from the previous part.




  3. Before observing any data, suppose we plan to randomly select a sample of 35 Cal Poly statistics students. Consider the unconditional prior distribution of the number of students in the sample who prefer data as singular. (This is called a prior predictive distribution.) Explain how you could use simulation to approximate this distribution. In particular, how could you use simulation to approximate the prior predictive probability that exactly 31 students in the sample prefer data as singular?




  4. Compute and interpret the prior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular.




Example 10.2 Continuing Example 10.1. Suppose that we observe a sample of 35 Cal Poly statistics students in which 31 prefer data as singular.

  1. Explain how you would use simulation to approximate the posterior distribution of \(\theta\).




  2. Find the posterior distribution of \(\theta\).




Example 10.3 Continuing Example 10.2. After observing a sample of 35 Cal Poly statistics students in which 31 prefer data as singular, suppose we plan to collect more data.

  1. Suppose we plan to randomly select an additional Cal Poly statistics student. Consider the posterior predictive probability that this student prefers data as singular. Explain how you could use simulation to estimate this probability.




  2. Compute the posterior predictive probability from the previous part.




  3. Suppose we plan to collect data on another sample of 35 Cal Poly statistics students. Consider the posterior predictive distribution of the number of students in the new sample who prefer data as singular. Explain how you could use simulation to approximate this distribution. In particular, how could you use simulation to approximate the posterior predictive probability that exactly 31 students in the sample prefer data as singular? (Of course, the sample size of the new sample does not have to be 35; however, we’re keeping it the same so we can compare the prior and posterior predictions.)




  4. Compute and interpret the posterior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular.




Example 10.4 Continuing Example 10.1. We’ll use a grid approximation and assume that any multiple of 0.0001 is a possible value of \(\theta\): 0, 0.0001, 0.0002, …, 0.9999, 1.

  1. Assume the prior distribution for \(\theta\) is proportional to \(\theta^2\). Plot this prior distribution and describe its main features. Is the prior distribution truly discrete or essentially continuous?




  2. Given the shape of the prior distribution, explain why we might not want to compute central prior credible intervals. Suggest an alternative approach, and compute a 98% prior credible interval for \(\theta\).




  3. Before observing any data, suppose we plan to randomly select a sample of 35 Cal Poly statistics students. Let \(y\) represent the number of students in the selected sample who prefer data as singular. Use simulation to approximate the prior predictive distribution of \(y\) and plot it. Is the prior predictive distribution truly discrete or essentially continuous?




  4. Find a 95% prior prediction interval for \(y\). Write a clearly worded sentence interpreting this interval in context.




Example 10.5 Continuing Example 10.4. Suppose that we observe a sample of 35 Cal Poly statistics students in which 31 prefer data as singular.

  1. Find the posterior distribution of \(\theta\), plot it and describe its main features, and find and interpret a 98% central posterior credible interval for \(\theta\). Is the posterior distribution truly discrete or essentially continuous?




  2. Suppose we plan to randomly select another sample of 35 Cal Poly statistics students. Let \(\tilde{y}\) represent the number of students in the new sample who prefer data as singular. Use simulation to approximate the posterior predictive distribution of \(\tilde{y}\) and plot it. Is the posterior predictive distribution truly discrete or essentially continuous? (Of course, the sample size of the new sample does not have to be 35. However, we’re keeping it the same so we can compare the prior and posterior predictions.)




  3. Find a 95% posterior prediction interval for \(\tilde{y}\). Write a clearly worded sentence interpreting this interval in context.




Example 10.6 Continuing Example 10.5. Now suppose instead of using the Cal Poly sample data (31/35) to form the posterior distribution of \(\theta\), we had used the data from the FiveThirtyEight study in which 865 out of 1093 respondents preferred data as singular.

  1. Find the posterior distribution of \(\theta\), plot it and describe its main features, and find and interpret a 98% central posterior credible interval for \(\theta\). How does the posterior based on the FiveThirtyEight data compare to the posterior distribution based on the Cal Poly sample data (31/35)? Why?




  2. Again, suppose we use the FiveThirtyEight data to form the posterior distribution of \(\theta\). Suppose we plan to randomly select a new sample of size 35. Let \(\tilde{y}\) represent the number in the new sample who prefer data as singular. Use simulation to approximate the posterior predictive distribution of \(\tilde{y}\) and plot it. Find a 95% posterior prediction interval for \(\tilde{y}\). How does the predictive distribution which uses the posterior distribution based on the FiveThirtyEight data compare to the one based on the Cal Poly sample data (31/35)? Why?




Example 10.7 Suppose we want to estimate \(\theta\), the population mean hours of sleep on a typical night for Cal Poly students. Assume that sleep hours, \(y\), for students follow a Normal distribution with unknown mean \(\theta\) and known standard deviation 1.5 hours. (Known population SD is an unrealistic assumption that we use for simplicity here.)

Assume a Normal(7.5, 0.5) prior distribution for \(\theta\).

  1. Describe what the prior distribution represents. Find and interpret a 98% prior credible interval.




  2. Explain how you would use simulation to approximate the prior predictive distribution of \(y\). Use simulation to approximate a 95% prior prediction interval, and interpret it.




  3. Now suppose you observe sleep hours (rounded to one decimal place) of 6.8 and 7.2 for a sample of 2 students. Explain how, in principle, you could use simulation to approximate the posterior distribution. (To preview what lies ahead: what issues would you experience with this method in practice?)




  4. Suppose that the posterior distribution is approximately Normal with mean 7.4 and SD 0.4. Describe what the posterior distribution represents. Find and interpret a 98% posterior credible interval.




  5. Explain how you would use simulation to approximate the posterior predictive distribution of \(y\). Use simulation to approximate a 95% posterior prediction interval, and interpret it.




  6. Now suppose you observe a very large sample of students (with a sample mean of 7.0), and the posterior distribution is Normal with mean 7.0 and SD 0.001. Without doing any calculations, approximate the posterior predictive distribution of \(y\) and a 95% posterior prediction interval.




10.1 Notes

10.1.1 Prior predictive probability of success for single trial

  1. Simulate a value of \(\theta\) from the prior distribution.
  2. Given the value of \(\theta\), construct a spinner that lands on success with probability \(\theta\). Spin the spinner once and record the result, success or not.
  3. Repeat steps 1 and 2 many times, and find the proportion of repetitions which result in success. This proportion approximates the (unconditional) prior predictive probability of success.
n_rep = 1000

theta = seq(0.1, 0.9, 0.2)

prior = c(0.01, 0.05, 0.15, 0.30, 0.49)

# simulate values of theta from the prior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = prior)

# for each simulated value of theta, simulate a single trial from Binomial(1, theta): "success" = 1, "failure" = 0 
y_sim = rbinom(n_rep, size = 1, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.9 1
0.9 1
0.9 0
0.9 1
0.7 1
0.9 1
0.9 1
0.9 1
0.9 1
0.9 1
sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(x = theta,
             y = y)) +
  geom_jitter(aes(color = theta, shape = y)) +
  scale_color_viridis_d(option = "magma", guide = "none") +
  scale_shape_manual(values = c(1, 19), guide = "none") +
  theme_bw()

sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma") +
  theme_bw()
(a) (theta, y) pairs for single trials: theta simulated from prior; y = 1 (succes) with probability theta
(b) Summary of simulated y values to approximate prior predictive probability of success
Figure 10.1: Prior predictive simulation

Law of total probability calculation of prior predictive probability:

\[ 0.1(0.01) + 0.3(0.05) + 0.5(0.15) + 0.7(0.30) + 0.9(0.49) = 0.742 \]

Figure 10.2: Probability of success on single trial, for different values of theta in Example 10.1. Color represents the prior probability of theta, with darker colors corresponding to higher prior probability.
Figure 10.3: Prior predictive probability of success on single trial

10.1.2 Prior predictive distribution of sample count of success

  1. Simulate a value of \(\theta\) from the prior distribution.
  2. Given the value of \(\theta\), construct a spinner that lands on success with probability \(\theta\). Spin the spinner 35 times and count \(y\), the number of spins that land on success.
  3. Repeat steps 1 and 2 many times, and record the number of successes (out of 35) for each repetition. Summarize the simulated \(y\) values to approximate the prior predictive distribution. To approximate the prior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular, count the number of simulated repetitions that result in 31 successes (\(y = 31\)) and divide by the total number of simulated repetitions.
n_rep = 10000

theta = seq(0.1, 0.9, 0.2)

prior = c(0.01, 0.05, 0.15, 0.30, 0.49)

# simulate values of theta from the prior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = prior)

# for each simulated value of theta, simulate number of successes from Binomial(35, theta)

y_sim = rbinom(n_rep, size = 35, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.9 32
0.1 2
0.3 5
0.9 30
0.7 30
0.9 33
0.7 26
0.9 31
0.7 18
0.5 17
theta_y_sim_plot <- sim |>
  mutate(theta = factor(theta_sim)) |>
  ggplot(aes(x = theta_sim,
             y = y_sim)) +
  geom_jitter(aes(color = theta)) +
  scale_color_viridis_d(option = "magma", guide = "none") +
  scale_x_continuous(breaks = theta) +
  labs(y = "y") +
  theme_bw()

theta_y_sim_plot

sim |>
  mutate(theta = factor(theta_sim),
         y = y_sim) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma") +
  theme_bw()
(a) (theta, y) pairs for samples of size 35: theta simulated from prior; y simulated from Binomial(35, theta)
(b) Summary of simulated y values to approximate prior predictive distribution of sample count of success
Figure 10.4: Prior predictive simulation

Law of total probability calculation for prior predictive probability that \(y = 31\) when \(n=35\)

\[\begin{align*} & \left(\binom{35}{31}(0.1)^{31}(1-0.1)^{4}\right)(0.01) + \left(\binom{35}{31}(0.3)^{31}(1-0.3)^{4}\right)(0.05)\\ & + \left(\binom{35}{31}(0.5)^{31}(1-0.5)^{4}\right)(0.15) + \left(\binom{35}{31}(0.7)^{31}(1-0.7)^{4}\right)(0.30)\\ & + \left(\binom{35}{31}(0.9)^{31}(1-0.9)^{4}\right)(0.49) \end{align*}\]

(a) Sample-to-sample distribution of \(y\), the number of successes in samples of size \(n=35\) for different values of theta in Example 10.1. Color represents the prior probability of $ heta$, with darker colors corresponding to higher prior probability.
(b) Prior predictive distribution of \(y\)
Figure 10.5: Prior predictive simulation

10.1.3 Posterior distribution - simulation

  1. Simulate a value of \(\theta\) from the prior distribution.
  2. Given the value of \(\theta\), construct a spinner that lands on success with probability \(\theta\). Spin the spinner 35 times and count \(y\), the number of spins that land on success.
  3. Repeat steps 1 and 2 many times, and record the number of successes (out of 35) for each repetition.
  4. Discard any repetitions for which \(y\neq 31\). For the remaining repetitions (with \(y=31\)) summarize the simulated \(\theta\) values to approximate the posterior distribution of \(\theta\).
y_obs = 31

# same (theta, y) simulation as before; highlight y == 31
theta_y_sim_plot +
  annotate("rect", xmin = 0, xmax = 1,
           ymin = y_obs - 0.4, ymax = y_obs + 0.4, alpha = 0.5,
           color = bayes_col["posterior"],
           fill = bayes_col["posterior"]) +
  theme_bw()

# Only keep (theta, y) pairs with y = 31, and summarize the theta values
sim_posterior_table = sim |>
  filter(y_sim == y_obs) |>
  count(theta_sim, name = "freq") |>
  mutate(rel_freq = freq / sum(freq))



sim_posterior_table |>
  ggplot(aes(x = theta_sim,
             y = rel_freq)) +
  geom_point(col = bayes_col["posterior"], size = 3) +
  geom_line(linetype = "dashed", col = bayes_col["posterior"]) +
  scale_x_continuous(limits = c(0, 1), breaks = theta) +
  labs(x = "theta",
       y = "Posterior") +
  theme_bw()
(a) (theta, y) pairs for samples of size 35: theta simulated from the prior; y simulated from Binomial(35, theta)
(b) Summary of simulated theta values from repetitions with \(y=31\) values to approximate posterior distribution of theta given \(y=31\)
Figure 10.6: Approximating the posterior distribution via simulation

10.1.4 Posterior distribution - calculation

# observed data
n = 35
y_obs = 31

# likelihood of observed data for each theta
likelihood = dbinom(y_obs, n, theta)

# posterior is proportional to product of prior and likelihood
product = prior * likelihood
posterior = product / sum(product)

bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |>
  kable_styling()
theta prior likelihood product posterior
0.1 0.01 0.0000 0.0000 0.0000
0.3 0.05 0.0000 0.0000 0.0000
0.5 0.15 0.0000 0.0000 0.0000
0.7 0.30 0.0067 0.0020 0.0201
0.9 0.49 0.1998 0.0979 0.9799
Total 1.00 0.2065 0.0999 1.0000

10.1.5 Posterior predictive probability of success for single trial

Similar to the prior predictive, but now simulate \(\theta\) from the posterior distribution instead of the prior distribution.

  1. Simulate a value of \(\theta\) from the posterior distribution.
  2. Given the value of \(\theta\), construct a spinner that lands on success with probability \(\theta\). Spin the spinner once and record the result, success or not.
  3. Repeat steps 1 and 2 many times, and find the proportion of repetitions which result in success. This proportion approximates the (unconditional) posterior predictive probability of success.
n_rep = 1000

# simulate values of theta from the posterior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = posterior)

# for each simulated value of theta, simulate a single trial from Binomial(1, theta): "success" = 1, "failure" = 0 
y_sim = rbinom(n_rep, size = 1, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.9 1
0.9 1
0.9 1
0.9 0
0.9 1
0.9 1
0.7 0
0.9 1
0.9 1
0.9 1
sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(x = theta,
             y = y)) +
  geom_jitter(aes(color = theta, shape = y)) +
  scale_color_viridis_d(option = "magma", guide = "none", limits = factor(theta)) +
  scale_shape_manual(values = c(1, 19), guide = "none") +
  theme_bw()

sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma", limits = factor(theta)) +
  theme_bw()
(a) (theta, y) pairs for single trials: theta simulated from posterior; y = 1 (succes) with probability theta
(b) Summary of simulated y values to approximate posterior predictive probability of success
Figure 10.7: Posterior predictive simulation

Law of total probability calculation of posterior predictive probability:

\[ 0.1(0.0000) + 0.3(0.0000) + 0.5(0.0000) + 0.7(0.0201 ) + 0.9(0.9799) = 0.8960 \]

Figure 10.8: Probability of success on single trial, for different values of theta in Example 10.1. Color represents the posterior probability of $ heta$, with darker colors corresponding to higher posterior probability.
Figure 10.9: Posterior predictive probability of success on single trial

10.1.6 Posterior predictive distribution of sample count of success

  1. Simulate a value of \(\theta\) from the posterior distribution.
  2. Given the value of \(\theta\), construct a spinner that lands on success with probability \(\theta\). Spin the spinner 35 times and count \(y\), the number of spins that land on success.
  3. Repeat steps 1 and 2 many times, and record the number of successes (out of 35) for each repetition. Summarize the simulated \(y\) values to approximate the posterior predictive distribution. To approximate the posterior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular, count the number of simulated repetitions that result in 31 successes (\(y = 31\)) and divide by the total number of simulated repetitions.
n_rep = 10000

# simulate values of theta from the posterior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = posterior)

# for each simulated value of theta, simulate number of successes from Binomial(35, theta)
y_sim = rbinom(n_rep, size = 35, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.9 31
0.9 34
0.9 31
0.9 33
0.9 27
0.9 31
0.9 30
0.9 33
0.9 30
0.9 33
sim |>
  mutate(theta = factor(theta_sim)) |>
  ggplot(aes(x = theta,
             y = y_sim)) +
  geom_jitter(aes(color = theta)) +
  scale_color_viridis_d(option = "magma", guide = "none", limits = factor(theta)) +
  scale_shape_manual(values = c(1, 19), guide = "none") +
  labs(y = "y") +
  theme_bw()

sim |>
  mutate(theta = factor(theta_sim),
         y = y_sim) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma", limits = factor(theta)) +
  theme_bw()
(a) (theta, y) pairs for samples of size 35: theta simulated from posterior; y simulated from Binomial(35, theta)
(b) Summary of simulated y values to approximate posterior predictive distribution of sample count of success
Figure 10.10: Posterior predictive simulation

Law of total probability calculation for posterior predictive probability that \(y = 31\) when \(n=35\)

\[\begin{align*} & \left(\binom{35}{31}(0.1)^{31}(1-0.1)^{4}\right)(0.0000) + \left(\binom{35}{31}(0.3)^{31}(1-0.3)^{4}\right)(0.0000)\\ & + \left(\binom{35}{31}(0.5)^{31}(1-0.5)^{4}\right)(0.0000) + \left(\binom{35}{31}(0.7)^{31}(1-0.7)^{4}\right)(0.0201)\\ & + \left(\binom{35}{31}(0.9)^{31}(1-0.9)^{4}\right)(0.9799) \end{align*}\]

(a) Sample-to-sample distribution of \(y\), the number of successes in samples of size \(n=35\) for different values of theta in Example 10.1. Color represents the posterior probability of theta, with darker colors corresponding to higher posterior probability.
(b) Posterior predictive distribution of \(y\)
Figure 10.11: Posterior predictive distribution

10.1.7 Continuous theta: Prior distribution

theta = seq(0, 1, 0.0001)

# prior
prior = theta ^ 2
prior = prior / sum(prior)

# Lower bound of one-sided 98% prior credible interval
prior_cdf = cumsum(prior)
theta[max(which(prior_cdf <= 0.02))]
[1] 0.2714

10.1.8 Continuous theta: Prior predictive distribution

n = 35

n_sim = 10000

theta_sim = sample(theta, n_sim, replace = TRUE, prob = prior)

y_sim = rbinom(n_sim, n, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.9664 34
0.7408 32
0.6907 23
0.9727 34
0.7652 27
0.6341 24
0.8949 32
0.6150 24
0.8281 30
0.8988 28
quantile(y_sim, 0.05)
5% 
12 

10.1.9 Continuous theta: Posterior distribution

# observed data
n = 35
y_obs = 31

# likelihood of observed data for each theta
likelihood = dbinom(y_obs, n, theta)

# posterior is proportional to product of prior and likelihood
product = prior * likelihood
posterior = product / sum(product)

# Bound of 98% posterior credible interval
posterior_cdf = cumsum(posterior)
# posterior 98% central credible interval
c(theta[max(which(posterior_cdf <= 0.01))],
  theta[min(which(posterior_cdf >= 0.99))])
[1] 0.7238 0.9651

10.1.10 Continuous theta: Posterior predictive distribution

n = 35

n_sim = 10000

theta_sim = sample(theta, n_sim, replace = TRUE, prob = posterior)

y_sim = rbinom(n_sim, n, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.8964 31
0.8938 27
0.8737 32
0.8360 30
0.8597 28
0.8756 32
0.7553 29
0.8895 32
0.8856 31
0.8013 24
quantile(y_sim, c(0.025, 0.975))
 2.5% 97.5% 
   24    35 

10.1.11 Continuous theta: Posterior distribution after observing large sample

# observed data
n = 1093
y_obs = 865

# likelihood of observed data for each theta
likelihood = dbinom(y_obs, n, theta)

# posterior is proportional to product of prior and likelihood
product = prior * likelihood
posterior = product / sum(product)

# Bound of 98% posterior credible interval
posterior_cdf = cumsum(posterior)
# posterior 98% central credible interval
c(theta[max(which(posterior_cdf <= 0.01))],
  theta[min(which(posterior_cdf >= 0.99))])
[1] 0.7619 0.8190

10.1.12 Continuous theta: Posterior predictive distribution (after observing large sample)

n = 35

n_sim = 10000

theta_sim = sample(theta, n_sim, replace = TRUE, prob = posterior)

y_sim = rbinom(n_sim, n, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()
theta_sim y_sim
0.7889 30
0.7916 26
0.7911 29
0.7935 30
0.8025 25
0.7508 24
0.7958 26
0.8017 29
0.7906 30
0.7903 28
quantile(y_sim, c(0.025, 0.975))
 2.5% 97.5% 
   23    32 

10.1.13 Sleep hours: Prior distribution

98% prior credible interval for mean sleep hours

qnorm(c(0.01, 0.99), 7.5, 0.5)
[1] 6.336826 8.663174
Figure 10.12: Example 10.7: Prior distribution of mean hours of sleep per night

10.1.14 Sleep hours: Prior predictive distribution

  1. Simulate mean sleep hours \(\theta\) from Normal(7.5, 0.5) prior distribution
  2. Given \(\theta\), simulate sleep hours for a single student \(y\) from Normal(\(\theta\), 1.5) distribution
  3. Repeat many times to get many \((\theta, y)\) pairs.
  4. Summarize the simulated \(y\) values to approximate the prior predictive distribution of sleep hours
n_rep = 10000

# simulate theta from prior distribution
theta_sim = rnorm(n_rep, 7.5, 0.5)

# simulate y from Normal(theta, sigma) distribution
sigma = 1.5
y_sim = rnorm(n_rep, theta, sigma)

# 95% prior prediction interval
quantile(y_sim, c(0.025, 0.975))
    2.5%    97.5% 
 4.08337 10.84144 
Figure 10.13: Example 10.7: Prior predictive distribution of sleep hours

10.1.15 Sleep hours: Posterior distribution

In principle, the posterior distribution of \(\theta\) given the observed sample (6.8, 7.2) can be found via the following.

  1. Simulate \(\theta\) from the Normal(7.5, 0.5) prior distribution
  2. Given \(\theta\), simulate a sample of size 2 from a \(N(\theta, 1.5)\) distribution (rounded to the nearest minute.)
  3. If the simulated sample is (6.8, 7.2) keep the repetition; otherwise discard it
  4. Repeat the process until enough non-discarded repetitions are obtained — all corresponding to samples with (6.8, 7.2). Summarize the simulated \(\mu\) values to approximate the posterior distribution.

However, the likelihood of producing a sample that matches the observed data is essentially 0, simply because there are so many possible samples (even if we round values to one decimal point and only discard samples for which the sample mean is not 7.0.)

n_rep = 10

data.frame(rep = 1:n_rep,
       theta = rnorm(n_rep, 7.5, 0.5)) |>
  mutate(samples = map(rep, ~sort(round(rnorm(2, theta, sigma), 1)))) |>
  kbl(align = 'r', digits = 1) |>
  kable_styling()
rep theta samples
1 6.1 6.2, 6.3
2 7.2 4.4, 8.1
3 6.9 5.3, 7.4
4 7.7 6.0, 6.1
5 8.0 4.8, 8.2
6 6.9 6.2, 8.2
7 6.5 4.8, 7.5
8 7.5 6.9, 7.4
9 7.5 6.7, 7.5
10 6.9 7.9, 8.0
Figure 10.14: Example 10.7: Posterior distribution of mean hours of sleep per night

98% posterior credible interval for mean sleep hours

qnorm(c(0.01, 0.99), 7.4, 0.4)
[1] 6.469461 8.330539

10.1.16 Sleep hours: Posterior predictive distribution

  1. Simulate mean sleep hours \(\theta\) from Normal(7.4, 0.4) posterior distribution
  2. Given \(\theta\), simulate sleep hours for a single student \(y\) from Normal(\(\theta\), 1.5) distribution
  3. Repeat many times to get many \((\theta, y)\) pairs.
  4. Summarize the simulated \(y\) values to approximate the posterior predictive distribution of sleep hours
n_rep = 10000

# simulate theta from posterior distribution
theta_sim = rnorm(n_rep, 7.4, 0.4)

# simulate y from Normal(theta, sigma) distribution
sigma = 1.5
y_sim = rnorm(n_rep, theta, sigma)

# 95% posterior prediction interval
quantile(y_sim, c(0.025, 0.975))
     2.5%     97.5% 
 4.129875 10.852964 
Figure 10.15: Example 10.7: Posterior predictive distribution of sleep hours

10.1.17 Sleep hours: Posterior predictive distribution after observing large sample

  1. Simulate mean sleep hours \(\theta\) from Normal(7.0, 0.001) posterior distribution
  2. Given \(\theta\), simulate sleep hours for a single student \(y\) from Normal(\(\theta\), 1.5) distribution
  3. Repeat many times to get many \((\theta, y)\) pairs.
  4. Summarize the simulated \(y\) values to approximate the posterior predictive distribution of sleep hours
n_rep = 10000

# simulate theta from posterior distribution
theta_sim = rnorm(n_rep, 7.0, 0.001)

# simulate y from Normal(theta, sigma) distribution
sigma = 1.5
y_sim = rnorm(n_rep, theta_sim, sigma)

# 95% posterior prediction interval
quantile(y_sim, c(0.025, 0.975))
    2.5%    97.5% 
3.999862 9.965420 
# Compare to Normal(7.0, 1.5)
qnorm(c(0.025, 0.975), 7.0, 1.5)
[1] 4.060054 9.939946
Figure 10.16: Example 10.7: Posterior predictive distribution of sleep hours (after observing large sample)