2.6 A simple working example

We will illustrate some conceptual differences between the Bayesian and Frequentist statistical approaches by performing inference on a random sample \(\mathbf{Y} = [Y_1, Y_2, \dots, Y_N]\), where \(Y_i \stackrel{iid}{\sim} N(\mu, \sigma^2)\) for \(i = 1, 2, \dots, N\).

In particular, we set \(\pi(\mu, \sigma) = \pi(\mu) \pi(\sigma) \propto \frac{1}{\sigma}\). This is a standard non-informative improper prior (Jeffreys prior, see Chapter 3). That is, this prior is perfectly compatible with the sample information. Thus, the posterior distribution is

\[ \begin{aligned} \pi(\mu,\sigma|\mathbf{y}) &\propto \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} \\ &= \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N \left((y_i-\bar{y}) - (\mu-\bar{y})\right)^2\right\} \\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-N}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\bar{y})^2\right\} \\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-(\alpha_n+1)}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}, \end{aligned} \]

where \(\bar{y} = \frac{\sum_{i=1}^N y_i}{N}\), \(\alpha_n = N-1\), and \(\hat{\sigma}^2 = \frac{\sum_{i=1}^N (y_i-\bar{y})^2}{N-1}\).

The first term in the last expression is the kernel of a normal density, \(\mu|\sigma,\mathbf{y} \sim N(\bar{y}, \sigma^2 / N)\). The second term is the kernel of an inverted gamma density (Zellner 1996b, 371), \(\sigma|\mathbf{y} \sim IG(\alpha_n, \hat{\sigma}^2)\). Therefore,

\[ \pi(\mu|\sigma,\mathbf{y}) = \frac{1}{\sqrt{2\pi \sigma^2 / N}} \exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}, \]

and

\[ \pi(\sigma|\mathbf{y}) = \frac{2}{\Gamma(\alpha_n / 2)} \left(\frac{\alpha_n \hat{\sigma}^2}{2}\right)^{\alpha_n / 2} \frac{1}{\sigma^{\alpha_n+1}} \exp\left\{-\frac{\alpha_n \hat{\sigma}^2}{2 \sigma^2}\right\}. \]

Observe that \(\mathbb{E}[\mu | \sigma, \mathbf{y}] = \bar{y}\); this is also the maximum likelihood (Frequentist) point estimate of \(\mu\) in this setting. In addition, the Frequentist \((1-\alpha)\%\) confidence interval and the Bayesian \((1-\alpha)\%\) credible interval have exactly the same form, \(\bar{y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\), where \(z_{\alpha/2}\) is the \(\alpha/2\) percentile of a standard normal distribution. However, the interpretations are entirely different. The confidence interval has a probabilistic interpretation under sampling variability of \(\bar{Y}\): in repeated sampling, \((1-\alpha)\%\) of the intervals \(\bar{Y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\) would include \(\mu\). However, given an observed realization of \(\bar{Y}\), say \(\bar{y}\), the probability of \(\bar{y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\) including \(\mu\) is either 1 or 0. This is why we refer to it as a \((1-\alpha)\%\) confidence interval. On the other hand, \(\bar{y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\) has a straightforward probabilistic interpretation in the Bayesian framework: there is a \((1-\alpha)\%\) probability that \(\mu\) lies within this interval.

If we want to get the marginal posterior density of \(\mu\),

\[\begin{align*} \pi(\mu|\mathbf{y})&=\int_{0}^{\infty} \pi(\mu,\sigma|\mathbf{y}) d\sigma\\ &\propto \int_{0}^{\infty} \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} d\sigma\\ &= \int_{0}^{\infty} \left(\frac{1}{\sigma}\right)^{N+1} \exp\left\{-\frac{N}{2\sigma^2}\frac{\sum_{i=1}^N (y_i-\mu)^2}{N}\right\} d\sigma\\ &=\left[\frac{2}{\Gamma(N/2)}\left(\frac{N\sum_{i=1}^N (y_i-\mu)^2}{2N}\right)^{N/2}\right]^{-1}\\ &\propto \left[\sum_{i=1}^N (y_i-\mu)^2\right]^{-N/2}\\ &=\left[\sum_{i=1}^N ((y_i-\bar{y})-(\mu-\bar{y}))^2\right]^{-N/2}\\ &=[\alpha_n\hat{\sigma}^2+N(\mu-\bar{y})^2]^{-N/2}\\ &\propto \left[1+\frac{1}{\alpha_n}\left(\frac{\mu-\bar{y}}{\hat{\sigma}/\sqrt{N}}\right)^2\right]^{-(\alpha_n+1)/2}. \end{align*}\]

The fourth line arises from the kernel of an inverted gamma density with \(N\) degrees of freedom in the integral (Zellner 1996b).

The last expression represents the kernel of a Student’s \(t\)-distribution with \(\alpha_n = N - 1\) degrees of freedom, expected value equal to \(\bar{y}\), and variance \(\frac{\hat{\sigma}^2}{N} \left( \frac{\alpha_n}{\alpha_n - 2} \right)\). Therefore, \(\mu | \mathbf{y} \sim t \left( \bar{y}, \frac{\hat{\sigma}^2}{N} \left( \frac{\alpha_n}{\alpha_n - 2} \right), \alpha_n \right)\).

Observe that a \((1-\alpha)\%\) confidence interval and a \((1-\alpha)\%\) credible interval have the same form, \(\bar{y} \pm |t_{\alpha/2}^{\alpha_n}| \frac{\hat{\sigma}}{\sqrt{N}}\), where \(t_{\alpha/2}^{\alpha_n}\) is the \(\alpha/2\) percentile of a Student’s \(t\)-distribution. However, the interpretations are entirely different.

The mathematical similarity between the Frequentist and Bayesian expressions in this example arises from the use of an improper prior.

Example: Math test

You have a random sample of math scores of size \(N = 50\) from a normal distribution, \(Y_i \sim N(\mu, \sigma^2)\). The sample mean and variance are equal to 102 and 10, respectively. Assuming an improper prior equal to \(\frac{1}{\sigma}\), we proceed with the following tasks:

Compute the 95% confidence and credible intervals for \(\mu\).
Determine the posterior probability that \(\mu > 103\).

Using the fact that \(\mu | \mathbf{y} \sim t\left(\bar{y}, \frac{\hat{\sigma}^2}{N} \left( \frac{\alpha_n}{\alpha_n - 2} \right), \alpha_n \right)\), which implies that the confidence and credible intervals for \(\mu\) are given by:

\[ \begin{aligned} \bar{y} \pm |t_{\alpha/2}^{\alpha_n}| \frac{\hat{\sigma}}{\sqrt{N}}, \end{aligned} \]

where \(\bar{y} = 102\), \(\hat{\sigma}^2 = 10\), and \(\alpha_n = 49\). Thus, the 95% confidence and credible intervals for \(\mu\) are the same, namely \((101.1, 102.9)\), and the posterior probability that \(\mu > 103\) is 1.49% given the sample information.

# Load required package
library(metRology)

## 
## Attaching package: 'metRology'

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

# Input values
n <- 50                # Sample size
sample_mean <- 102     # Sample mean
sample_var <- 10       # Sample variance
alpha <- n - 1         # Degrees of freedom
se <- sqrt(sample_var / n)  # Standard error

# Confidence interval for the mean (95%)
lower_bound <- sample_mean - abs(qt(0.025, df = alpha)) * se
upper_bound <- sample_mean + abs(qt(0.025, df = alpha)) * se

cat(sprintf("95%% CI: [%.3f, %.3f]\n", lower_bound, upper_bound))

## 95% CI: [101.101, 102.899]

# Probability that population mean is greater than y_cut
y_cut <- 103
prob_mu_gt_ycut <- 1 - pt.scaled(y_cut, df = alpha, mean = sample_mean, sd = se)

cat(sprintf("P(mu > %.1f) = %.4f\n", y_cut, prob_mu_gt_ycut))

## P(mu > 103.0) = 0.0150

References

———. 1996b. “Introduction to Bayesian Inference in Econometrics.”