10 Introduction to Prediction

A Bayesian analysis leads directly and naturally to making predictions about future observations from the random process that generated the data.
Prediction is also useful for tuning prior distributions, and for checking if model assumptions seem reasonable in light of observed data.

Example 10.1 Do people prefer to use the word “data” as singular or plural? FiveThirtyEight conducted a poll to address this question (and others). Rather than simply ask whether the respondent considered “data” to be singular or plural, they asked which of the following sentences they prefer:

Some experts say it’s important to drink milk, but the data is inconclusive.
Some experts say it’s important to drink milk, but the data are inconclusive.

Suppose we wish to study the opinions of students in Cal Poly statistics classes regarding this issue. That is, let $θ$ represent the population proportion of students in Cal Poly statistics classes who prefer to consider data as a singular noun, as in option a) above (“the data is”).

Before proceeding, consider the context of this problem and sketch your prior distribution for $θ$ . What are the main features of your prior?

To illustrate ideas, we’ll start with a prior distribution which places probability 0.01, 0.05, 0.15, 0.30, 0.49 on the values 0.1, 0.3, 0.5, 0.7, 0.9, respectively.

Before observing any data, suppose we plan to randomly select a single Cal Poly statistics student. Consider the unconditional prior probability that the selected student prefers data as singular. (This is called a prior predictive probability.) Explain how you could use simulation to approximate this probability.
Compute the prior predictive probability from the previous part.
Before observing any data, suppose we plan to randomly select a sample of 35 Cal Poly statistics students. Consider the unconditional prior distribution of the number of students in the sample who prefer data as singular. (This is called a prior predictive distribution.) Explain how you could use simulation to approximate this distribution. In particular, how could you use simulation to approximate the prior predictive probability that exactly 31 students in the sample prefer data as singular?
Compute and interpret the prior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular.

Example 10.2 Continuing Example 10.1. Suppose that we observe a sample of 35 Cal Poly statistics students in which 31 prefer data as singular.

Explain how you would use simulation to approximate the posterior distribution of $θ$ .
Find the posterior distribution of $θ$ .

Example 10.3 Continuing Example 10.2. After observing a sample of 35 Cal Poly statistics students in which 31 prefer data as singular, suppose we plan to collect more data.

Suppose we plan to randomly select an additional Cal Poly statistics student. Consider the posterior predictive probability that this student prefers data as singular. Explain how you could use simulation to estimate this probability.
Compute the posterior predictive probability from the previous part.
Suppose we plan to collect data on another sample of 35 Cal Poly statistics students. Consider the posterior predictive distribution of the number of students in the new sample who prefer data as singular. Explain how you could use simulation to approximate this distribution. In particular, how could you use simulation to approximate the posterior predictive probability that exactly 31 students in the sample prefer data as singular? (Of course, the sample size of the new sample does not have to be 35; however, we’re keeping it the same so we can compare the prior and posterior predictions.)
Compute and interpret the posterior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular.

The predictive distribution of a measured variable $y$ is the marginal distribution (of the unobserved values) after accounting for the uncertainty in the parameters $θ$ .
A prior predictive distribution is calculated using the prior distribution of the parameters.
A posterior predictive distribution is calculated using the posterior distribution of the parameters, conditional on the observed data.
Be sure to carefully distinguish between posterior distributions and posterior predictive distributions (or between prior distributions and prior predictive distributions.)
- Prior and posterior distributions are distributions on values of the parameters. These distributions quantify the degree of uncertainty about unknown parameters $θ$ (before and after observing data).
- Prior and posterior predictive distributions are distributions on potential values of the measured variable (data). Predictive distributions reflect sample-to-sample variability of the sample data, while also accounting for the uncertainty in the parameters.
Simulation is an effective tool in approximating posterior (or prior) predictive distributions.
- Step 1: Simulate a value of $θ$ from its posterior distribution (or prior distribution).
- Step 2: Given this value of $θ$ simulate a value of $y$ from $f (y | θ)$ , the data model conditional on $θ$ .
- Repeat many times to simulate many $(θ, y)$ pairs, and summarize the values of $y$ to approximate the posterior predictive distribution (or prior predictive distribution).

Example 10.4 Continuing Example 10.1. We’ll use a grid approximation and assume that any multiple of 0.0001 is a possible value of $θ$ : 0, 0.0001, 0.0002, …, 0.9999, 1.

Assume the prior distribution for $θ$ is proportional to $θ^{2}$ . Plot this prior distribution and describe its main features. Is the prior distribution truly discrete or essentially continuous?
Given the shape of the prior distribution, explain why we might not want to compute central prior credible intervals. Suggest an alternative approach, and compute a 98% prior credible interval for $θ$ .
Before observing any data, suppose we plan to randomly select a sample of 35 Cal Poly statistics students. Let $y$ represent the number of students in the selected sample who prefer data as singular. Use simulation to approximate the prior predictive distribution of $y$ and plot it. Is the prior predictive distribution truly discrete or essentially continuous?
Find a 95% prior prediction interval for $y$ . Write a clearly worded sentence interpreting this interval in context.

Example 10.5 Continuing Example 10.4. Suppose that we observe a sample of 35 Cal Poly statistics students in which 31 prefer data as singular.

Find the posterior distribution of $θ$ , plot it and describe its main features, and find and interpret a 98% central posterior credible interval for $θ$ . Is the posterior distribution truly discrete or essentially continuous?
Suppose we plan to randomly select another sample of 35 Cal Poly statistics students. Let $\tilde{y}$ represent the number of students in the new sample who prefer data as singular. Use simulation to approximate the posterior predictive distribution of $\tilde{y}$ and plot it. Is the posterior predictive distribution truly discrete or essentially continuous? (Of course, the sample size of the new sample does not have to be 35. However, we’re keeping it the same so we can compare the prior and posterior predictions.)
Find a 95% posterior prediction interval for $\tilde{y}$ . Write a clearly worded sentence interpreting this interval in context.

Example 10.6 Continuing Example 10.5. Now suppose instead of using the Cal Poly sample data (31/35) to form the posterior distribution of $θ$ , we had used the data from the FiveThirtyEight study in which 865 out of 1093 respondents preferred data as singular.

Find the posterior distribution of $θ$ , plot it and describe its main features, and find and interpret a 98% central posterior credible interval for $θ$ . How does the posterior based on the FiveThirtyEight data compare to the posterior distribution based on the Cal Poly sample data (31/35)? Why?
Again, suppose we use the FiveThirtyEight data to form the posterior distribution of $θ$ . Suppose we plan to randomly select a new sample of size 35. Let $\tilde{y}$ represent the number in the new sample who prefer data as singular. Use simulation to approximate the posterior predictive distribution of $\tilde{y}$ and plot it. Find a 95% posterior prediction interval for $\tilde{y}$ . How does the predictive distribution which uses the posterior distribution based on the FiveThirtyEight data compare to the one based on the Cal Poly sample data (31/35)? Why?

Be sure to distinguish between a prior/posterior distribution and a prior/posterior predictive distribution.
A prior/posterior distribution is a distribution on potential values of the parameters $θ$ . These distributions quantify the degree of uncertainty about the unknown parameter $θ$ (before and after observing data).
A prior/posterior predictive distribution is a distribution on potential values of the data $y$ . Predictive distributions reflect sample-to-sample variability of the sample data, while accounting for the uncertainty in the parameters.
Even if parameters are essentially “known” — that is, even if the prior/posterior SD of parameters is small — there will still be sample-to-sample variability reflected in the predictive distribution of the data, mainly influenced by the size $n$ of the sample being “predicted” (or unit-to-unit variability for the predictive distribution of a single observation).

Example 10.7 Suppose we want to estimate $θ$ , the population mean hours of sleep on a typical night for Cal Poly students. Assume that sleep hours, $y$ , for students follow a Normal distribution with unknown mean $θ$ and known standard deviation 1.5 hours. (Known population SD is an unrealistic assumption that we use for simplicity here.)

Assume a Normal(7.5, 0.5) prior distribution for $θ$ .

Describe what the prior distribution represents. Find and interpret a 98% prior credible interval.
Explain how you would use simulation to approximate the prior predictive distribution of $y$ . Use simulation to approximate a 95% prior prediction interval, and interpret it.
Now suppose you observe sleep hours (rounded to one decimal place) of 6.8 and 7.2 for a sample of 2 students. Explain how, in principle, you could use simulation to approximate the posterior distribution. (To preview what lies ahead: what issues would you experience with this method in practice?)
Suppose that the posterior distribution is approximately Normal with mean 7.4 and SD 0.4. Describe what the posterior distribution represents. Find and interpret a 98% posterior credible interval.
Explain how you would use simulation to approximate the posterior predictive distribution of $y$ . Use simulation to approximate a 95% posterior prediction interval, and interpret it.
Now suppose you observe a very large sample of students (with a sample mean of 7.0), and the posterior distribution is Normal with mean 7.0 and SD 0.001. Without doing any calculations, approximate the posterior predictive distribution of $y$ and a 95% posterior prediction interval.

10.1 Notes

10.1.1 Prior predictive probability of success for single trial

Simulate a value of $θ$ from the prior distribution.
Given the value of $θ$ , construct a spinner that lands on success with probability $θ$ . Spin the spinner once and record the result, success or not.
Repeat steps 1 and 2 many times, and find the proportion of repetitions which result in success. This proportion approximates the (unconditional) prior predictive probability of success.

n_rep = 1000

theta = seq(0.1, 0.9, 0.2)

prior = c(0.01, 0.05, 0.15, 0.30, 0.49)

# simulate values of theta from the prior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = prior)

# for each simulated value of theta, simulate a single trial from Binomial(1, theta): "success" = 1, "failure" = 0 
y_sim = rbinom(n_rep, size = 1, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.9	1
0.9	1
0.9	0
0.9	1
0.7	1
0.9	1
0.9	1
0.9	1
0.9	1
0.9	1

sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(x = theta,
             y = y)) +
  geom_jitter(aes(color = theta, shape = y)) +
  scale_color_viridis_d(option = "magma", guide = "none") +
  scale_shape_manual(values = c(1, 19), guide = "none") +
  theme_bw()

sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma") +
  theme_bw()

(a) (theta, y) pairs for single trials: theta simulated from prior; y = 1 (succes) with probability theta

Law of total probability calculation of prior predictive probability:

$0.1 (0.01) + 0.3 (0.05) + 0.5 (0.15) + 0.7 (0.30) + 0.9 (0.49) = 0.742$

Figure 10.2: Probability of success on single trial, for different values of theta in Example 10.1. Color represents the prior probability of theta, with darker colors corresponding to higher prior probability.

Figure 10.3: Prior predictive probability of success on single trial

10.1.2 Prior predictive distribution of sample count of success

Simulate a value of $θ$ from the prior distribution.
Given the value of $θ$ , construct a spinner that lands on success with probability $θ$ . Spin the spinner 35 times and count $y$ , the number of spins that land on success.
Repeat steps 1 and 2 many times, and record the number of successes (out of 35) for each repetition. Summarize the simulated $y$ values to approximate the prior predictive distribution. To approximate the prior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular, count the number of simulated repetitions that result in 31 successes ( $y = 31$ ) and divide by the total number of simulated repetitions.

n_rep = 10000

theta = seq(0.1, 0.9, 0.2)

prior = c(0.01, 0.05, 0.15, 0.30, 0.49)

# simulate values of theta from the prior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = prior)

# for each simulated value of theta, simulate number of successes from Binomial(35, theta)

y_sim = rbinom(n_rep, size = 35, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.9	32
0.1	2
0.3	5
0.9	30
0.7	30
0.9	33
0.7	26
0.9	31
0.7	18
0.5	17

theta_y_sim_plot <- sim |>
  mutate(theta = factor(theta_sim)) |>
  ggplot(aes(x = theta_sim,
             y = y_sim)) +
  geom_jitter(aes(color = theta)) +
  scale_color_viridis_d(option = "magma", guide = "none") +
  scale_x_continuous(breaks = theta) +
  labs(y = "y") +
  theme_bw()

theta_y_sim_plot

sim |>
  mutate(theta = factor(theta_sim),
         y = y_sim) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma") +
  theme_bw()

(a) (theta, y) pairs for samples of size 35: theta simulated from prior; y simulated from Binomial(35, theta)

Law of total probability calculation for prior predictive probability that $y = 31$ when $n = 35$

$\begin{aligned} ((\binom{35}{31}) (0.1)^{31} (1 - 0.1)^{4}) (0.01) + ((\binom{35}{31}) (0.3)^{31} (1 - 0.3)^{4}) (0.05) \\ + ((\binom{35}{31}) (0.5)^{31} (1 - 0.5)^{4}) (0.15) + ((\binom{35}{31}) (0.7)^{31} (1 - 0.7)^{4}) (0.30) \\ + ((\binom{35}{31}) (0.9)^{31} (1 - 0.9)^{4}) (0.49) \end{aligned}$

(a) Sample-to-sample distribution of $y$ , the number of successes in samples of size $n = 35$ for different values of theta in Example 10.1. Color represents the prior probability of $ heta$, with darker colors corresponding to higher prior probability.

10.1.3 Posterior distribution - simulation

Simulate a value of $θ$ from the prior distribution.
Given the value of $θ$ , construct a spinner that lands on success with probability $θ$ . Spin the spinner 35 times and count $y$ , the number of spins that land on success.
Repeat steps 1 and 2 many times, and record the number of successes (out of 35) for each repetition.
Discard any repetitions for which $y \neq 31$ . For the remaining repetitions (with $y = 31$ ) summarize the simulated $θ$ values to approximate the posterior distribution of $θ$ .

y_obs = 31

# same (theta, y) simulation as before; highlight y == 31
theta_y_sim_plot +
  annotate("rect", xmin = 0, xmax = 1,
           ymin = y_obs - 0.4, ymax = y_obs + 0.4, alpha = 0.5,
           color = bayes_col["posterior"],
           fill = bayes_col["posterior"]) +
  theme_bw()

# Only keep (theta, y) pairs with y = 31, and summarize the theta values
sim_posterior_table = sim |>
  filter(y_sim == y_obs) |>
  count(theta_sim, name = "freq") |>
  mutate(rel_freq = freq / sum(freq))



sim_posterior_table |>
  ggplot(aes(x = theta_sim,
             y = rel_freq)) +
  geom_point(col = bayes_col["posterior"], size = 3) +
  geom_line(linetype = "dashed", col = bayes_col["posterior"]) +
  scale_x_continuous(limits = c(0, 1), breaks = theta) +
  labs(x = "theta",
       y = "Posterior") +
  theme_bw()

(a) (theta, y) pairs for samples of size 35: theta simulated from the prior; y simulated from Binomial(35, theta)

10.1.4 Posterior distribution - calculation

# observed data
n = 35
y_obs = 31

# likelihood of observed data for each theta
likelihood = dbinom(y_obs, n, theta)

# posterior is proportional to product of prior and likelihood
product = prior * likelihood
posterior = product / sum(product)

bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |>
  kable_styling()

theta	prior	likelihood	product	posterior
0.1	0.01	0.0000	0.0000	0.0000
0.3	0.05	0.0000	0.0000	0.0000
0.5	0.15	0.0000	0.0000	0.0000
0.7	0.30	0.0067	0.0020	0.0201
0.9	0.49	0.1998	0.0979	0.9799
Total	1.00	0.2065	0.0999	1.0000

10.1.5 Posterior predictive probability of success for single trial

Similar to the prior predictive, but now simulate $θ$ from the posterior distribution instead of the prior distribution.

Simulate a value of $θ$ from the posterior distribution.
Given the value of $θ$ , construct a spinner that lands on success with probability $θ$ . Spin the spinner once and record the result, success or not.
Repeat steps 1 and 2 many times, and find the proportion of repetitions which result in success. This proportion approximates the (unconditional) posterior predictive probability of success.

n_rep = 1000

# simulate values of theta from the posterior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = posterior)

# for each simulated value of theta, simulate a single trial from Binomial(1, theta): "success" = 1, "failure" = 0 
y_sim = rbinom(n_rep, size = 1, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.9	1
0.9	1
0.9	1
0.9	0
0.9	1
0.9	1
0.7	0
0.9	1
0.9	1
0.9	1

sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(x = theta,
             y = y)) +
  geom_jitter(aes(color = theta, shape = y)) +
  scale_color_viridis_d(option = "magma", guide = "none", limits = factor(theta)) +
  scale_shape_manual(values = c(1, 19), guide = "none") +
  theme_bw()

sim |>
  mutate(theta = factor(theta_sim),
         y = factor(y_sim)) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma", limits = factor(theta)) +
  theme_bw()

(a) (theta, y) pairs for single trials: theta simulated from posterior; y = 1 (succes) with probability theta

Law of total probability calculation of posterior predictive probability:

$0.1 (0.0000) + 0.3 (0.0000) + 0.5 (0.0000) + 0.7 (0.0201) + 0.9 (0.9799) = 0.8960$

Figure 10.8: Probability of success on single trial, for different values of theta in Example 10.1. Color represents the posterior probability of $ heta$, with darker colors corresponding to higher posterior probability.

Figure 10.9: Posterior predictive probability of success on single trial

10.1.6 Posterior predictive distribution of sample count of success

Simulate a value of $θ$ from the posterior distribution.
Given the value of $θ$ , construct a spinner that lands on success with probability $θ$ . Spin the spinner 35 times and count $y$ , the number of spins that land on success.
Repeat steps 1 and 2 many times, and record the number of successes (out of 35) for each repetition. Summarize the simulated $y$ values to approximate the posterior predictive distribution. To approximate the posterior predictive probability that exactly 31 students in a sample of size 35 prefer data as singular, count the number of simulated repetitions that result in 31 successes ( $y = 31$ ) and divide by the total number of simulated repetitions.

n_rep = 10000

# simulate values of theta from the posterior
theta_sim = sample(theta, n_rep, replace = TRUE, prob = posterior)

# for each simulated value of theta, simulate number of successes from Binomial(35, theta)
y_sim = rbinom(n_rep, size = 35, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.9	31
0.9	34
0.9	31
0.9	33
0.9	27
0.9	31
0.9	30
0.9	33
0.9	30
0.9	33

sim |>
  mutate(theta = factor(theta_sim)) |>
  ggplot(aes(x = theta,
             y = y_sim)) +
  geom_jitter(aes(color = theta)) +
  scale_color_viridis_d(option = "magma", guide = "none", limits = factor(theta)) +
  scale_shape_manual(values = c(1, 19), guide = "none") +
  labs(y = "y") +
  theme_bw()

sim |>
  mutate(theta = factor(theta_sim),
         y = y_sim) |>
  ggplot(aes(y)) +
  geom_bar(aes(fill = theta)) +
  scale_fill_viridis_d(option = "magma", limits = factor(theta)) +
  theme_bw()

(a) (theta, y) pairs for samples of size 35: theta simulated from posterior; y simulated from Binomial(35, theta)

Law of total probability calculation for posterior predictive probability that $y = 31$ when $n = 35$

$\begin{aligned} ((\binom{35}{31}) (0.1)^{31} (1 - 0.1)^{4}) (0.0000) + ((\binom{35}{31}) (0.3)^{31} (1 - 0.3)^{4}) (0.0000) \\ + ((\binom{35}{31}) (0.5)^{31} (1 - 0.5)^{4}) (0.0000) + ((\binom{35}{31}) (0.7)^{31} (1 - 0.7)^{4}) (0.0201) \\ + ((\binom{35}{31}) (0.9)^{31} (1 - 0.9)^{4}) (0.9799) \end{aligned}$

10.1.7 Continuous theta: Prior distribution

theta = seq(0, 1, 0.0001)

# prior
prior = theta ^ 2
prior = prior / sum(prior)

# Lower bound of one-sided 98% prior credible interval
prior_cdf = cumsum(prior)
theta[max(which(prior_cdf <= 0.02))]

[1] 0.2714

10.1.8 Continuous theta: Prior predictive distribution

n = 35

n_sim = 10000

theta_sim = sample(theta, n_sim, replace = TRUE, prob = prior)

y_sim = rbinom(n_sim, n, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.9664	34
0.7408	32
0.6907	23
0.9727	34
0.7652	27
0.6341	24
0.8949	32
0.6150	24
0.8281	30
0.8988	28

quantile(y_sim, 0.05)

5% 
12

10.1.9 Continuous theta: Posterior distribution

# observed data
n = 35
y_obs = 31

# likelihood of observed data for each theta
likelihood = dbinom(y_obs, n, theta)

# posterior is proportional to product of prior and likelihood
product = prior * likelihood
posterior = product / sum(product)

# Bound of 98% posterior credible interval
posterior_cdf = cumsum(posterior)
# posterior 98% central credible interval
c(theta[max(which(posterior_cdf <= 0.01))],
  theta[min(which(posterior_cdf >= 0.99))])

[1] 0.7238 0.9651

10.1.10 Continuous theta: Posterior predictive distribution

n = 35

n_sim = 10000

theta_sim = sample(theta, n_sim, replace = TRUE, prob = posterior)

y_sim = rbinom(n_sim, n, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.8964	31
0.8938	27
0.8737	32
0.8360	30
0.8597	28
0.8756	32
0.7553	29
0.8895	32
0.8856	31
0.8013	24

quantile(y_sim, c(0.025, 0.975))

 2.5% 97.5% 
   24    35

10.1.11 Continuous theta: Posterior distribution after observing large sample

# observed data
n = 1093
y_obs = 865

# likelihood of observed data for each theta
likelihood = dbinom(y_obs, n, theta)

# posterior is proportional to product of prior and likelihood
product = prior * likelihood
posterior = product / sum(product)

# Bound of 98% posterior credible interval
posterior_cdf = cumsum(posterior)
# posterior 98% central credible interval
c(theta[max(which(posterior_cdf <= 0.01))],
  theta[min(which(posterior_cdf >= 0.99))])

[1] 0.7619 0.8190

10.1.12 Continuous theta: Posterior predictive distribution (after observing large sample)

n = 35

n_sim = 10000

theta_sim = sample(theta, n_sim, replace = TRUE, prob = posterior)

y_sim = rbinom(n_sim, n, theta_sim)

sim = data.frame(theta_sim, y_sim)

sim |> head(10) |> kbl() |> kable_styling()

theta_sim	y_sim
0.7889	30
0.7916	26
0.7911	29
0.7935	30
0.8025	25
0.7508	24
0.7958	26
0.8017	29
0.7906	30
0.7903	28

quantile(y_sim, c(0.025, 0.975))

 2.5% 97.5% 
   23    32

10.1.13 Sleep hours: Prior distribution

98% prior credible interval for mean sleep hours

qnorm(c(0.01, 0.99), 7.5, 0.5)

[1] 6.336826 8.663174

Figure 10.12: Example 10.7: Prior distribution of mean hours of sleep per night

10.1.14 Sleep hours: Prior predictive distribution

Simulate mean sleep hours $θ$ from Normal(7.5, 0.5) prior distribution
Given $θ$ , simulate sleep hours for a single student $y$ from Normal( $θ$ , 1.5) distribution
Repeat many times to get many $(θ, y)$ pairs.
Summarize the simulated $y$ values to approximate the prior predictive distribution of sleep hours

n_rep = 10000

# simulate theta from prior distribution
theta_sim = rnorm(n_rep, 7.5, 0.5)

# simulate y from Normal(theta, sigma) distribution
sigma = 1.5
y_sim = rnorm(n_rep, theta, sigma)

# 95% prior prediction interval
quantile(y_sim, c(0.025, 0.975))

    2.5%    97.5% 
 4.08337 10.84144

Figure 10.13: Example 10.7: Prior predictive distribution of sleep hours

10.1.15 Sleep hours: Posterior distribution

In principle, the posterior distribution of $θ$ given the observed sample (6.8, 7.2) can be found via the following.

Simulate $θ$ from the Normal(7.5, 0.5) prior distribution
Given $θ$ , simulate a sample of size 2 from a $N (θ, 1.5)$ distribution (rounded to the nearest minute.)
If the simulated sample is (6.8, 7.2) keep the repetition; otherwise discard it
Repeat the process until enough non-discarded repetitions are obtained — all corresponding to samples with (6.8, 7.2). Summarize the simulated $μ$ values to approximate the posterior distribution.

However, the likelihood of producing a sample that matches the observed data is essentially 0, simply because there are so many possible samples (even if we round values to one decimal point and only discard samples for which the sample mean is not 7.0.)

n_rep = 10

data.frame(rep = 1:n_rep,
       theta = rnorm(n_rep, 7.5, 0.5)) |>
  mutate(samples = map(rep, ~sort(round(rnorm(2, theta, sigma), 1)))) |>
  kbl(align = 'r', digits = 1) |>
  kable_styling()

rep	theta	samples
1	6.1	6.2, 6.3
2	7.2	4.4, 8.1
3	6.9	5.3, 7.4
4	7.7	6.0, 6.1
5	8.0	4.8, 8.2
6	6.9	6.2, 8.2
7	6.5	4.8, 7.5
8	7.5	6.9, 7.4
9	7.5	6.7, 7.5
10	6.9	7.9, 8.0

Figure 10.14: Example 10.7: Posterior distribution of mean hours of sleep per night

98% posterior credible interval for mean sleep hours

qnorm(c(0.01, 0.99), 7.4, 0.4)

[1] 6.469461 8.330539

10.1.16 Sleep hours: Posterior predictive distribution

Simulate mean sleep hours $θ$ from Normal(7.4, 0.4) posterior distribution
Given $θ$ , simulate sleep hours for a single student $y$ from Normal( $θ$ , 1.5) distribution
Repeat many times to get many $(θ, y)$ pairs.
Summarize the simulated $y$ values to approximate the posterior predictive distribution of sleep hours

n_rep = 10000

# simulate theta from posterior distribution
theta_sim = rnorm(n_rep, 7.4, 0.4)

# simulate y from Normal(theta, sigma) distribution
sigma = 1.5
y_sim = rnorm(n_rep, theta, sigma)

# 95% posterior prediction interval
quantile(y_sim, c(0.025, 0.975))

     2.5%     97.5% 
 4.129875 10.852964

Figure 10.15: Example 10.7: Posterior predictive distribution of sleep hours

10.1.17 Sleep hours: Posterior predictive distribution after observing large sample

Simulate mean sleep hours $θ$ from Normal(7.0, 0.001) posterior distribution
Given $θ$ , simulate sleep hours for a single student $y$ from Normal( $θ$ , 1.5) distribution
Repeat many times to get many $(θ, y)$ pairs.
Summarize the simulated $y$ values to approximate the posterior predictive distribution of sleep hours

n_rep = 10000

# simulate theta from posterior distribution
theta_sim = rnorm(n_rep, 7.0, 0.001)

# simulate y from Normal(theta, sigma) distribution
sigma = 1.5
y_sim = rnorm(n_rep, theta_sim, sigma)

# 95% posterior prediction interval
quantile(y_sim, c(0.025, 0.975))

    2.5%    97.5% 
3.999862 9.965420

# Compare to Normal(7.0, 1.5)
qnorm(c(0.025, 0.975), 7.0, 1.5)

[1] 4.060054 9.939946

Figure 10.16: Example 10.7: Posterior predictive distribution of sleep hours (after observing large sample)