13  Considering Prior Distributions

One of the most commonly asked questions when one first encounters Bayesian statistics is “how do we choose a prior?” While there is never one “perfect” prior in any situation, we’ll noew discuss some issues to consider when choosing a prior. But first, here are a few big picture ideas to keep in mind:

Example 13.1 Tamika is a basketball player who throughout her career has had a probability of 0.5 of making any three point attempt. However, her coach is afraid that her three point shooting has gotten worse. To check this, the coach has Tamika shoot a series of three pointers; she makes 7 out of 24. Does the coach have evidence that Tamika has gotten worse?

Let \(\theta\) be the probability that Tamika successfully makes any three point attempt. Assume attempts are independent.

  1. Prior to collecting data, the coach decides that he’ll have convincing evidence that Tamika has gotten worse if the p-value is less than 0.025. Suppose the coach told Tamika to shoot 24 attempts and then stop and count the number of successful attempts. Use software to compute the p-value. Is the coach convinced that Tamika has gotten worse?




  2. Prior to collecting data, the coach decides that he’ll have convincing evidence that Tamika has gotten worse if the p-value is less than 0.025. Suppose the coach told Tamika to shoot until she makes 7 three pointers and then stop and count the number of total attempts. Use software to compute the p-value. Is the coach convinced that Tamika has gotten worse? (Hint: the total number of attempts has a Negative Binomial distribution.)




  3. Now suppose the coach takes a Bayesian approach and assumes a Beta(\(\alpha\), \(\beta\)) prior distribution for \(\theta\). Suppose the coach told Tamika to shoot 24 attempts and then stop and count the number of successful attempts. Identify the likelihood function and the posterior distribution of \(\theta\).




  4. Now suppose the coach takes a Bayesian approach and assumes a Beta(\(\alpha\), \(\beta\)) prior distribution for \(\theta\). Suppose the coach told Tamika to shoot until she makes 7 three pointers and then stop and count the number of total attempts. Identify the likelihood function and the posterior distribution of \(\theta\).




  5. Compare the Bayesian and frequentist approaches in this example. Does the “strength of the evidence” depend on how the data were collected?




Here are some recommendations when choosing priors from the Stan development team.

Example 13.2 Suppose we want to estimate \(\theta\), the population proportion of Cal Poly students who wore socks at any point yesterday.

  1. What are the possible values for \(\theta\)? What prior distribution might you consider a noninformative prior distribution?




  2. You might choose a Uniform(0, 1) prior, a.k.a., a Beta(1, 1) prior. Recall how we interpreted the parameters \(\alpha\) and \(\beta\) in the Beta-Binomial model. Does the Beta(1, 1) distribution represent “no prior information”?





  3. Suppose in a sample of 20 students, 4 wore socks yesterday. How would you estimate \(\theta\) with a single number based only on the data?




  4. Assume a Beta(1, 1) prior and the 4/20 sample data. Identify the posterior distribution. Recall that one Bayesian point estimate of \(\theta\) is the posterior mean. Find the posterior mean of \(\theta\). Does this estimate let the “data speak entirely for itself”?




  5. How could you change \(\alpha\) and \(\beta\) in the Beta distribution prior to represent no prior information? Sketch the prior. Do you see any potential problems?




  6. Assume a Beta(0, 0) prior for \(\theta\) and the 4/20 sample data. Identify the posterior distribution. Find the posterior mode of \(\theta\). Does this estimate let the “data speak entirely for itself”?




  7. Now suppose the parameter you want to estimate is the odds that a student wore socks yesterday, \(\phi=\frac{\theta}{1-\theta}\). What are the possible values of \(\phi\)? What might a non-informative prior look like? Is this a proper prior?




  8. Assume a Beta(1, 1) prior for \(\theta\). Use simulation to approximate the prior distribution of the odds \(\phi\). Would you say this is a noninformative prior for \(\phi\)?




Example 13.3 Suppose that \(\theta\) represents the population proportion of adults who have a particular rare disease.

  1. Explain why you might not want to use a flat Uniform(0, 1) prior for \(\theta\).




  2. Assume a Uniform(0, 1) prior. Suppose you will test \(n=100\) suspected cases. Use simulation to approximate the prior predictive distribution of the number in the sample who have the disease. Does this seem reasonable?




  3. Assume a Uniform(0, 1) prior. Suppose that in \(n=100\) suspected cases, none actually has the disease. Find and interpret the posterior median. Does this seem reasonable?




13.1 Notes

13.1.1 Improper Beta(0, 0) prior

13.1.2 Prior distribution of odds

  1. Simulate a value of \(\theta\) from the Beta(1, 1) prior distribution.
  2. Compute the odds \(\phi = \theta / (1 - \theta)\).
  3. Repeat many times and summarize the simulated \(\phi\) values to approximate the prior distribution of \(\phi\).

The distribution of \(\phi\) has an extremely long right tail, so the plot is clipped below.

theta = rbeta(10000, 1, 1)

odds = theta / (1 - theta)


ggplot(data.frame(odds),
       aes(x = odds)) + 
  geom_histogram(aes(y = after_stat(density)),
                 bins = 100,
                 col = bayes_col["prior"],
                 fill = bayes_col["prior"]) +
  scale_x_continuous(limits = c(0, 30)) +
  labs(x = "phi")

13.1.3 Prior predictive distribution

n_rep = 10000

theta_sim = rbeta(n_rep, 1, 1)

y_sim = rbinom(n_rep, 100, theta_sim)

ggplot(data.frame(y_sim),
       aes(x = y_sim)) +
  geom_bar(aes(y = after_stat(prop)),
           col = bayes_col["posterior_predict"],
           fill = bayes_col["posterior_predict"],
           width = 0.1) +
  labs(x = "Number of successes",
       y = "Simulated relative frequency") +
  theme_bw()