12 Beta-Binomial Model
- Bayesian analysis is based on the posterior distribution of parameters
given data . - The data
might be discrete (e.g., count data) or continuous (e.g., measurement data). - However, parameters
almost always take values on a continuous scale, even when the data are discrete. - Now we introduce continuous prior and posterior distributions to quantify uncertainty about parameters. Some general notation:
represents1 parameters of interest usually taking values on a continuous scale denotes observed sample data2 (discrete or continuous) denotes the prior distribution of , usually a probability density function (pdf) over possible values of denotes the likelihood function, a function of continuous for fixed denotes the posterior distribution of , the conditional distribution of given the data .
- Bayes rule works analogously for a continuous parameter
, given data - Predictive distribution can be found using a continuous analog of the law of total probability
However, we almost always use simulation to approximate predictive distributions.
Example 12.1 Recall Example 10.4 which concerned
Why might we not want to assume a Normal prior distribution for
?
In Example 10.4 we assumed a prior distribution for
which is proportional to . In that example, we used a grid approximation for the prior distribution, but now we’ll treat the distribution as continuous. Sketch this prior distribution.
Now we’ll consider a few more prior distributions. Sketch each of the following priors. How do they compare?
- proportional to
. (from previous) - proportional to
. - proportional to
. - proportional to
. - proportional to
.
- proportional to
- A continuous random variable
has a Beta distribution with shape parameters and if its density (pdf) satisfies3 - If
the distribution is symmetric about 0.5 - If
the distribution is skewed to the left (with greater density above 0.5 than below) - If
the distribution is skewed to the right (with greater density below 0.5 than above) - If
and , the Beta(1, 1) distribution is the Uniform distribution on (0, 1). - It can be shown that a Beta(
, ) distribution has
Example 12.2 Continuing Example 12.1.
- Each of the distributions in the previous example was a Beta distribution. For each distribution, identify the shape parameters and the prior mean and standard deviation.
Prior proportional to | Prior |
Prior |
Prior Mean | Prior SD |
---|---|---|---|---|
Assume a Beta(3, 1) prior distribution for
. Now suppose that 31 students in a sample of 35 Cal Poly statistics students prefer data as singular. Write the function that defines the shape of the likelihood as a function of .
Write the function that defines the shape of the posterior density as a function of
Is this a Beta distribution? If so, specify the values of and .
Starting with each of the prior distributions from the first part, find the posterior distribution of
based on this sample, and identify it as a Beta distribution by specifying the shape parameters and .
Prior | Posterior |
Posterior |
Posterior Mean | Posterior SD |
---|---|---|---|---|
Beta(3, 1) | ||||
Beta(6, 1) | ||||
Beta(1, 3) | ||||
Beta(3, 3) | ||||
Beta(6, 3) |
- For each of the posterior distributions in the previous part, compute the posterior mean and standard deviation. How does each posterior distribution compare to its respective prior distribution?
- Beta-Binomial model. If
has a Beta prior distribution and the conditional distribution of given is the Binomial distribution, then the posterior distribution of given is the Beta distribution4. - In a sense, you can interpret
as “prior successes” and as “prior failures”, but these are only “pseudo-observations”. Also, and are not necessarily integers.
Prior | Data | Posterior | |
---|---|---|---|
Successes | |||
Failures | |||
Total |
Example 12.3 In Example 10.4 we essentially assumed a Beta(3, 1) prior but we used grid approximation to compute the posterior. Now we’ll compare our results from that exercise to those using the continuous Beta(3, 1) prior.
Assume that
Plot the prior distribution, (scaled) likelihood, and posterior distribution.
Use software to find 50%, 80%, and 98% central posterior credible intervals.
Compare the results to those using the grid approximation.
Express the posterior mean as a weighted average of the prior mean and sample proportion. Describe what the weights are, and explain why they make sense.
- In the Beta-Binomial model, the posterior mean
can be expressed as a weighted average of the prior mean and the sample proportion . - As more data are collected, more weight is given to the sample proportion (and less weight to the prior mean).
- The prior “weight” is determined by
, which is sometimes called the concentration and measured in “pseudo-observations”. - Larger values of
indicate stronger prior beliefs, due to smaller prior variance, and give more weight to the prior mean. - The posterior variance generally gets smaller as more data are collected
Example 12.4 Continuing the previous examples, we’ll now consider predictive distributions. Assume that
Before observing any data, suppose we plan to randomly select a sample of 35 Cal Poly statistics students. Let
represent the number of students in the selected sample who prefer data as singular. Explain how we could use simulation to approximate the prior predictive distribution of . Then conduct the simulation and approximate the prior predictive distribution. Is the prior predictive distribution truly discrete or continuous? How does the prior predictive distribution compare to the one from Example 10.4?
Now suppose we observe 31 students in a sample of 35 Cal Poly statistics students prefer data as singular, so that the posterior distribution of
is the Beta(34, 5) distribution. Suppose we plan to randomly select another sample of 35 Cal Poly statistics students. Let represent the number of students in the selected sample who prefer data as singular. Explain how we could use simulation to approximate the posterior predictive distribution of . Then conduct the simulation and approximate the posterior predictive distribution. Is the posterior predictive distribution truly discrete or continuous? How does the posterior predictive distribution compare to the one from Example 10.5?
Use the simulation results to approximate a 95% posterior prediction interval for
. Write a clearly worded sentence interpreting this interval in context.
- You can tune the shape parameters —
(like “prior successes”) and (like “prior failures”) — of a Beta distribution to your prior beliefs in a few ways. Recall that is the “concentration” or “equivalent prior sample size”. - If prior mean
and prior concentration are specified then - If prior mode
and prior concentration (with ) are specified then - If prior mean
and prior sd are specified then - You can also specify two percentiles and use software to find
and . For example, you could specify the endpoints of a prior 98% credible interval.
12.1 Notes
12.1.1 Prior Beta distributions
Distribution | Proportional to | Mean | SD | |||
---|---|---|---|---|---|---|
a | Beta(3, 1) | 3 | 1 | 0.750 | 0.194 | |
b | Beta(6, 1) | 6 | 1 | 0.857 | 0.124 | |
c | Beta(1, 3) | 1 | 3 | 0.250 | 0.194 | |
d | Beta(3, 3) | 3 | 3 | 0.500 | 0.189 | |
e | Beta(6, 3) | 6 | 3 | 0.667 | 0.149 |
12.1.2 Posterior Beta distributions
Prior Distribution | Posterior proportional to | Posterior Distribution | Posterior Mean | Posterior SD | |
---|---|---|---|---|---|
a | Beta(3, 1) | Beta(34, 5) | 0.872 | 0.053 | |
b | Beta(6, 1) | Beta(37, 5) | 0.881 | 0.049 | |
c | Beta(1, 3) | Beta(32, 7) | 0.821 | 0.061 | |
d | Beta(3, 3) | Beta(34, 7) | 0.829 | 0.058 | |
e | Beta(6, 3) | Beta(37, 7) | 0.841 | 0.055 |
12.1.3 Posterior distribution
The Beta(3, 1) prior is proportional to
Given
The posterior density, as a function of
We can use stat_fun
to plot a function over a range. This works fine for the dbeta
prior and posterior, but unfortunately the likeihood, determined by dbinom
is not on a density scale.
# Beta prior
= 3
alpha_prior = 1
beta_prior
# data
= 35
n = 31
y
# Beta posterior
= alpha_prior + y
alpha_post = beta_prior + n - y
beta_post
# Plot
ggplot(data.frame(x = c(0, 1)),
aes(x = x)) +
# prior
stat_function(fun = dbeta,
args = list(shape1 = alpha_prior,
shape2 = beta_prior),
col = bayes_col["prior"],
lty = bayes_lty["prior"],
linewidth = 1) +
# likelihood
stat_function(fun = dbinom,
args = list(size = n,
x = y),
col = bayes_col["likelihood"],
lty = bayes_lty["likelihood"],
linewidth = 1) +
# posterior
stat_function(fun = dbeta,
args = list(shape1 = alpha_post,
shape2 = beta_post),
col = bayes_col["posterior"],
lty = bayes_lty["posterior"],
linewidth = 1) +
labs(x = "theta",
y = "") +
theme_bw()
The following code creates a scaled version of the likelihood function for plotting (and moves a few things around to create a legend).
# data
= 35
n = 31
y
# scaled likelihood, depends on data: n, y
<- function(theta) {
likelihood_scaled <- function(theta) {
likelihood dbinom(x = y, size = n, prob = theta)
}<- integrate(likelihood, lower = 0, upper = 1)[[1]]
scaling_constant likelihood(theta) / scaling_constant
}
# Plot
ggplot(data.frame(x = c(0, 1)),
aes(x = x)) +
# prior
stat_function(fun = dbeta,
args = list(shape1 = alpha_prior,
shape2 = beta_prior),
lty = bayes_lty["prior"],
linewidth = 1,
aes(color = "prior", linetype = "prior")) +
# (scaled) likelihood
stat_function(fun = likelihood_scaled,
lty = bayes_lty["likelihood"],
linewidth = 1,
aes(color = "likelihood", linetype = "likeihood")) +
# posterior
stat_function(fun = dbeta,
args = list(shape1 = alpha_post,
shape2 = beta_post),
lty = bayes_lty["posterior"],
linewidth = 1,
aes(color = "posterior", linetype = "posterior")) +
# Define color and add a legend
scale_color_manual(name = "",
breaks = c("prior", "likelihood", "posterior"),
values = bayes_col[c("prior", "likelihood", "posterior")]) +
scale_linetype_manual(name = "",
breaks = c("prior", "likelihood", "posterior"),
values = bayes_lty[c("prior", "likelihood", "posterior")]) +
labs(x = "theta",
y = "") +
theme_bw()
Warning: No shared levels found between `names(values)` of the manual scale and the
data's linetype values.
# 50% posterior credible interval
qbeta(c(0.25, 0.75), alpha_post, beta_post)
[1] 0.8398218 0.9105693
# 80% posterior credible interval
qbeta(c(0.10, 0.90), alpha_post, beta_post)
[1] 0.8005591 0.9345918
# 98% posterior credible interval
qbeta(c(0.01, 0.99), alpha_post, beta_post)
[1] 0.7239139 0.9650609
12.1.4 Prior predictive distribution
- Simulate
from the Beta(3, 1) prior distribution usingrbeta
. - Given
, simulate from the Binomial(35, ) distribution. - Simulate many
pairs and summarize the simulated values to approximate the prior predictive distribution.
= 35
n
= 10000
n_rep
# simulate theta from Beta prior distribution
= rbeta(n_rep, alpha_prior, beta_prior)
theta_sim
# simulate y from Binomial(n, theta)
= rbinom(n_rep, n, theta_sim)
y_sim
= data.frame(theta_sim, y_sim)
sim
# display a few simulated pairs
|> head(10) |> kbl() |> kable_styling() sim
theta_sim | y_sim |
---|---|
0.6280839 | 25 |
0.8274895 | 29 |
0.4882894 | 15 |
0.8900228 | 31 |
0.7641595 | 31 |
0.8126329 | 31 |
0.9561866 | 34 |
0.7012563 | 25 |
0.8672446 | 31 |
0.3422079 | 13 |
ggplot(data.frame(y_sim),
aes(x = y_sim)) +
geom_bar(aes(y = after_stat(prop)),
col = bayes_col["posterior_predict"],
fill = bayes_col["posterior_predict"],
width = 0.1) +
labs(x = "Number of successes",
y = "Simulated relative frequency") +
theme_bw()
quantile(y_sim, 0.05)
5%
12
12.1.5 Posterior predictive distribution
- Simulate
from the Beta(34, 5) posterior distribution usingrbeta
. - Given
, simulate from the Binomial(35, ) distribution. - Simulate many
pairs and summarize the simulated values to approximate the posterior predictive distribution.
= 35
n
= 10000
n_rep
# simulate theta from Beta posterior distribution
= rbeta(n_rep, alpha_post, beta_post)
theta_sim
# simulate y from Binomial(n, theta)
= rbinom(n_rep, n, theta_sim)
y_sim
= data.frame(theta_sim, y_sim)
sim
# display a few simulated pairs
|> head(10) |> kbl() |> kable_styling() sim
theta_sim | y_sim |
---|---|
0.8813914 | 29 |
0.9181022 | 33 |
0.8875739 | 30 |
0.8756327 | 30 |
0.9126112 | 30 |
0.7890674 | 27 |
0.6912154 | 24 |
0.8557116 | 30 |
0.8235766 | 30 |
0.8505959 | 29 |
ggplot(data.frame(y_sim),
aes(x = y_sim)) +
geom_bar(aes(y = after_stat(prop)),
col = bayes_col["posterior_predict"],
fill = bayes_col["posterior_predict"],
width = 0.1) +
labs(x = "Number of successes",
y = "Simulated relative frequency") +
theme_bw()
quantile(y_sim, c(0.025, 0.975))
2.5% 97.5%
25 35
is used to denote both: (1) the actual parameter (i.e., the random variable) itself, and (2) possible values of .↩︎When considering the prior predictive distribution
also denotes potential values of the data. For the posterior predictive distribution denotes potential new values of the data.↩︎The expression defines the shape of the Beta density. All that’s missing is the scaling constant which ensures that the total area under the density is 1. The actual Beta density formula, including the normalizing constant, is
where is the Gamma function. For a positive integer , . Also, .↩︎When the prior and posterior distribution belong to the same family, that family is called a conjugate prior distribution for the likelihood. So, the Beta distributions form a conjugate prior family for Binomial distributions.↩︎