6.1 Comparing Bayesian and frequentist interval estimates

The most widely used elements of “traditional” frequentist inference are confidence intervals and hypothesis tests (a.k.a, null hypothesis significance tests). The numerical results of Bayesian and frequentist analysis are often similar. However, the interpretations are very different.

Example 6.5 We’ll now compare the Bayesian credible intervals in Example 6.4 to frequentist confidence intervals. Recall the actual study data in which 75% of the 1502 American adults surveyed said they read a book in the last year.

  1. Compute a 98% confidence interval for \(\theta\).
  2. Write a clearly worded sentence reporting the confidence interval in context.
  3. Explain what “98% confidence” means.
  4. Compare the numerical results of the Bayesian and frequentist analysis. Are they similar or different?
  5. How does the interpretation of these results differ between the two approaches?
\iffalse{} Solution. to Example 6.5
Show/hide solution
  1. The observed sample proportion is \(\hat{p} = 0.75\) and its standard error is \(\sqrt{\hat{p}(1-\hat{p})/n} = \sqrt{0.75(1-0.75)/1502} =0.011\). The usual formula for a confidence interval for a population prportion is \[ \hat{p} \pm z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \] where \(z^*\) is the multiple from a standard Normal distribution corresponding to the level of confidence (e.g., \(z^* = 2.33\) for 98% confidence). A 98% confidence interval for \(\theta\) is [0.724, 0.776].

  2. We estimate with 98% confidence that the population proportion of American adults who have read a book in the last year is between 0.724 and 0.776.

  3. Confidence is in the estimation procedure. Over many samples, 98% of samples will yield confidence intervals, computed using the above formula, that contain the true parameter value (a fixed number) The intervals change from sample to sample; the parameter is fixed.

  4. The numerical results are similar: the 98% posterior credible interval is similar to the 98% confidence interval. Both reflect a conclusion that we think that somewhere-in-the-70s percent of American adults have read at least one book in the past year.

  5. However, the interpretation of these results is very different between the two approaches.

    The Bayesian approach provides probability statements about the parameter: There is a 98% chance that \(\theta\) is between 0.718 and 0.771; our assessment is that \(\theta\) is 49 times more likely to lie inside the interval [0.718, 0.771] than outside.

    In the frequestist approach such a probability statement makes no sense. From the frequentist perspective, \(\theta\) is an unknown number: either that number is in the interval [0.724, 0.776] or it’s not; there’s no probability to it. Rather, the frequentist approach develops procedures based on the probability of what might happen over many samples. Notice that in the interpretation of what 98% confidence means above, the actual numbers [0.724, 0.776] did not appear. The confidence is in the procedure that produced the interval, and not in the interval itself.

Example 6.6 Have more than 70% of Americans read a book in the last year? We’ll now compare the Bayesian analysis in Example 6.4 to a frequentist (null) hypothesis (significance) test. Recall the actual study data in which 75% of the 1502 American adults surveyed said they read a book in the last year.

  1. Conduct an appropriate hypothesis test.
  2. Write a clearly worded sentence reporting the conclusion of the hypothesis test in context.
  3. Write a clearly worded sentence interpreting the p-value in context.
  4. Now back to the Bayesian analysis of Example 6.4. Compute the posterior probability that \(\theta\) is less than or equal to 0.70.
  5. Compare the numerical values of the posterior probability and the p-value. Are they similar or different?
  6. How does the interpretation of these results differ between the two approaches?
\iffalse{} Solution. to Example 6.6
Show/hide solution
  1. The null hypothesis is \(H_0:\theta = 0.7\). The alternative hypothesis is \(H_a:\theta>0.7\). The standard deviation of the null distribution is \(\sqrt{0.7(1-0.7)/1502} = 0.0118\). The standardized (test) statistic is \((0.75 - 0.7) / 0.0118 = 4.23\). With such a large sample size, the null distribution of sample proportions is approximately Normal, so the p-value is approximately 1 - pnorm(4.23) = 0.000012.

  2. With a p-value of 0.000012 we have extremely strong evidence to reject the null hypothesis and conclude that more than 70% of Americans have read a book in the last year.

  3. Interpreting the p-value

    • If the population proportion of Americans who have read a book in the last year is equal to 0.7,
    • Then we would observe a sample proportion of 0.75 or more in about 0.0012% (about 1 in 100,000) of random samples of size 1502.
    • Since we actually observed a sample proportion of 0.75, which would be extremely unlikely if the population proportion were 0.7,
    • The data provide evidence that the population proportion is not 0.7.
  4. See Example 6.4 where we computed the posterior probability that \(\theta\) is greater than 0.7. The posterior probability that \(\theta\) is less than or equal to 0.7 is 0.000051.

    Note: in the frequentist hypothesis test, the null hypothesis \(H_0:\theta=0.7\) is operationally the same as \(H_0:\theta \le 0.7\); the test is conducted the same way and results in the same p-value. Computing the posterior probability that \(\theta\le 0.7\) is like computing the probability that the null hypothesis is true. Now, the p-value is not the probability that the null hypothesis is true, even though that is a common misinterpretation. But there is no direct Bayesian analog of a p-value, so this will have to do.

  5. The numerical results are similar; both the p-value and the posterior probability are on the order of 1/100000. Both reflect a strong endorsement of the conclusion that more than 70% of Americans have read a book in the past year.

  6. However, the interpretation of these results is very different between the two approaches.

    The Bayesian analysis computes a probability that \(\theta <0.7\): there’s an extremely small probability that \(\theta\) is less than 0.7, so we’d be willing to bet a very large amount of money that it’s not.

    But such a probability make no sense from a frequentist perspective. From the frequentist perspective, the unknown parameter \(\theta\) is a number: either than number is greater than 0.7 or it’s not; there’s no probability to it. The p-value is a probability referring to what would happen over many samples.

Since a Bayesian analysis treats parameters as random variables, it is possible to make probability statements about parameters. In contrast, a frequentist analysis treats unknown parameters as fixed — that is, not random — so probability statements do not apply to parameters. In a frequentist approach, probability statements (like “95% confidence”) are based on how the sample data would behave over many hypothetical samples.

In a Bayesian approach

  • Parameters are random variables and have distributions.
  • Observed data are treated as fixed, not random.
  • All inference is based on the posterior distribution of parameters which quantifies our uncertainty about the parameters.
  • The posterior distribution quantifies our uncertainty in the parameters, after observing the sample data.
  • The posterior (or prior) distribution can be used to make probability statements about parameters.
  • For example, “95% credible” quantifies our assessment that the parameter is 19 times more likely to lie inside the credible interval than outside. (Roughly, we’d be willing to bet at 19-to-1 odds on whether \(\theta\) is inside the interval [0.718, 0.771].)

In a frequentist approach

  • Parameters are treated as fixed (not random), but unknown numbers
  • Data are treated as random
  • All inference is based on the sampling distribution of the data which quantifies how the data behaves over many hypothetical samples.
  • For example, “95% confidence” is confidence in the procedure: confidence intervals vary from sample-to-sample; over many samples 95% of confidence intervals contain the parameter being estimated.