2.7 A simple working example

We will illustrate some conceptual differences between the Bayesian and Frequentist statistical approaches performing inference given a random sample $\mathbf{y}=[y_1,y_2,\dots,y_N]$ , where $y_i\stackrel{iid}{\sim} N(\mu, \sigma^2)$ , $i=1,2,\dots,N$ .

In particular, we set $\pi(\mu,\sigma)=\pi(\mu)\pi(\sigma)\propto \frac{1}{\sigma}$ . This is a standard non-informative improper prior (Jeffreys prior, see Chapter 3), that is, this prior is perfectivelly compatible with sample information. In addition, we are assuming independent priors for $\mu$ and $\sigma$ . Then,

$\begin{align} \pi(\mu,\sigma)&\propto \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\}\\ &= \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N ((y_i-\bar{y}) - (\mu-\bar{y}))^2\right\}\\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-N}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\bar{y})^2\right\}\\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-(\alpha_n+1)}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}, \end{align}$

where $\bar{y}=\frac{\sum_{i=1}^N}{N}$ , $\alpha_n=N-1$ and $\hat{\sigma}^2=\frac{\sum_{i=1}^N (y_i-\bar{y})^2}{N-1}$ .

The first term in the last expression is the kernel of a normal density, $\mu|\sigma,\mathbf{y}\sim N(\bar{y},\sigma^2/N)$ . The second term is the kernel of an inverted gamma density (Zellner 1996, p.~ 371), $\sigma|\mathbf{y}\sim IG(\alpha_n,\hat{\sigma}^2)$ . Therefore, $\pi(\mu|\sigma,\mathbf{y})=(2\pi\sigma^2/N)^{-1/2}\exp\left\{\frac{-N}{2\sigma^2}(\mu-\bar{y})^2\right\}$ and $\pi(\sigma|\mathbf{y})=\frac{2}{\Gamma(\alpha_n/2)}\left(\frac{\alpha_n\hat{\sigma}^2}{2}\right)^{\alpha_n/2}\frac{1}{\sigma^{\alpha_n+1}}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}$ .

Observe that $\mathbb{E}[\mu|\sigma,\mathbf{y}]=\bar{y}$ , this is also the maximum likelihood (Frequentist) point estimate of $\mu$ in this setting. In addition, the Frequentist $(1-\alpha)\%$ confidence interval and the Bayesian $(1-\alpha)\%$ credible interval have exactly the same form, $\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$ , where $z_{\alpha/2}$ is the $\alpha/2$ percentile of a standard normal distribution. However, the interpretations are totally different. The confidence interval has a probabilistic interpretation under sampling variability of $\bar{Y}$ , that is, in repeated sampling $(1-\alpha)\%$ of the intervals $\bar{Y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$ would include $\mu$ , but given an observed realization of $\bar{Y}$ , say $\bar{y}$ , the probability of $\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$ including $\mu$ is 1 or 0, that is why we say a $(1-\alpha)\%$ confidence interval. On the other hand, $\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$ has a simple probabilistic interpretation in the Bayesian framework, there is a $(1-\alpha)\%$ probability that $\mu$ lies in this interval.

If we want to get the marginal posterior density of $\mu$ ,

$\begin{align} \pi(\mu|\mathbf{y})&=\int_{0}^{\infty} \pi(\mu,\sigma|\mathbf{y}) d\sigma\\ &\propto \int_{0}^{\infty} \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} d\sigma\\ &= \int_{0}^{\infty} \left(\frac{1}{\sigma}\right)^{N+1} \exp\left\{-\frac{N}{2\sigma^2}\frac{\sum_{i=1}^N (y_i-\mu)^2}{N}\right\} d\sigma\\ &=\left[\frac{2}{\Gamma(N/2)}\left(\frac{N\sum_{i=1}^N (y_i-\mu)^2}{2N}\right)^{N/2}\right]^{-1}\\ &\propto \left[\sum_{i=1}^N (y_i-\mu)^2\right]^{-N/2}\\ &=\left[\sum_{i=1}^N ((y_i-\bar{y})-(\mu-\bar{y}))^2\right]^{-N/2}\\ &=[\alpha_n\hat{\sigma}^2+N(\mu-\bar{y})^2]^{-N/2}\\ &\propto \left[1+\frac{1}{\alpha_n}\left(\frac{\mu-\bar{y}}{\hat{\sigma}/\sqrt{N}}\right)^2\right]^{-(\alpha_n+1)/2} \end{align}$

The fourth line is due to having the kernel of a inverted gamma density with $N$ degrees of freedom in the integral.

The last expression is the kernel of a Student’s t density function with $\alpha_n=N-1$ degrees of freedom, expected value equal to $\bar{y}$ , and variance $\frac{\hat{\sigma}^2}{N}\left(\frac{\alpha_n}{\alpha_n-2}\right)$ . Then, $\mu|\mathbf{y}\sim t\left(\bar{y},\frac{\hat{\sigma}^2}{N}\left(\frac{\alpha_n}{\alpha_n-2}\right),\alpha_n\right)$ .

Observe that a $(1-\alpha)\%$ confidence interval and $(1-\alpha)\%$ credible interval have exactly the same expression, $\bar{y}\pm |t_{\alpha/2}^{\alpha_n}|\frac{\hat{\sigma}}{\sqrt{N}}$ , where $t_{\alpha/2}^{\alpha_n}$ is the $\alpha/2$ percentile of a Student’s t distribution. But again, the interpretations are totally different.

The mathematical similarity between the Frequentist and Bayesian expressions in this examples are due to using a non-informative improper prior.