## 2.7 A simple working example

We will illustrate some conceptual differences between the Bayesian and Frequentist statistical approaches performing inference given a random sample $$\mathbf{y}=[y_1,y_2,\dots,y_N]$$, where $$y_i\stackrel{iid}{\sim} N(\mu, \sigma^2)$$, $$i=1,2,\dots,N$$.

In particular, we set $$\pi(\mu,\sigma)=\pi(\mu)\pi(\sigma)\propto \frac{1}{\sigma}$$. This is a standard non-informative improper prior (Jeffreys prior, see Chapter 3), that is, this prior is perfectivelly compatible with sample information. In addition, we are assuming independent priors for $$\mu$$ and $$\sigma$$. Then,

\begin{align} \pi(\mu,\sigma|\mathbf{y})&\propto \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\}\\ &= \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N ((y_i-\bar{y}) - (\mu-\bar{y}))^2\right\}\\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-N}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\bar{y})^2\right\}\\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-(\alpha_n+1)}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}, \end{align}

where $$\bar{y}=\frac{\sum_{i=1}^N}{N}$$, $$\alpha_n=N-1$$ and $$\hat{\sigma}^2=\frac{\sum_{i=1}^N (y_i-\bar{y})^2}{N-1}$$.

The first term in the last expression is the kernel of a normal density, $$\mu|\sigma,\mathbf{y}\sim N(\bar{y},\sigma^2/N)$$. The second term is the kernel of an inverted gamma density (Zellner 1996, p.~ 371), $$\sigma|\mathbf{y}\sim IG(\alpha_n,\hat{\sigma}^2)$$. Therefore, $$\pi(\mu|\sigma,\mathbf{y})=(2\pi\sigma^2/N)^{-1/2}\exp\left\{\frac{-N}{2\sigma^2}(\mu-\bar{y})^2\right\}$$ and $$\pi(\sigma|\mathbf{y})=\frac{2}{\Gamma(\alpha_n/2)}\left(\frac{\alpha_n\hat{\sigma}^2}{2}\right)^{\alpha_n/2}\frac{1}{\sigma^{\alpha_n+1}}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}$$.

Observe that $$\mathbb{E}[\mu|\sigma,\mathbf{y}]=\bar{y}$$, this is also the maximum likelihood (Frequentist) point estimate of $$\mu$$ in this setting. In addition, the Frequentist $$(1-\alpha)\%$$ confidence interval and the Bayesian $$(1-\alpha)\%$$ credible interval have exactly the same form, $$\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$$, where $$z_{\alpha/2}$$ is the $$\alpha/2$$ percentile of a standard normal distribution. However, the interpretations are totally different. The confidence interval has a probabilistic interpretation under sampling variability of $$\bar{Y}$$, that is, in repeated sampling $$(1-\alpha)\%$$ of the intervals $$\bar{Y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$$ would include $$\mu$$, but given an observed realization of $$\bar{Y}$$, say $$\bar{y}$$, the probability of $$\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$$ including $$\mu$$ is 1 or 0, that is why we say a $$(1-\alpha)\%$$ confidence interval. On the other hand, $$\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{N}$$ has a simple probabilistic interpretation in the Bayesian framework, there is a $$(1-\alpha)\%$$ probability that $$\mu$$ lies in this interval.

If we want to get the marginal posterior density of $$\mu$$,

\begin{align} \pi(\mu|\mathbf{y})&=\int_{0}^{\infty} \pi(\mu,\sigma|\mathbf{y}) d\sigma\\ &\propto \int_{0}^{\infty} \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} d\sigma\\ &= \int_{0}^{\infty} \left(\frac{1}{\sigma}\right)^{N+1} \exp\left\{-\frac{N}{2\sigma^2}\frac{\sum_{i=1}^N (y_i-\mu)^2}{N}\right\} d\sigma\\ &=\left[\frac{2}{\Gamma(N/2)}\left(\frac{N\sum_{i=1}^N (y_i-\mu)^2}{2N}\right)^{N/2}\right]^{-1}\\ &\propto \left[\sum_{i=1}^N (y_i-\mu)^2\right]^{-N/2}\\ &=\left[\sum_{i=1}^N ((y_i-\bar{y})-(\mu-\bar{y}))^2\right]^{-N/2}\\ &=[\alpha_n\hat{\sigma}^2+N(\mu-\bar{y})^2]^{-N/2}\\ &\propto \left[1+\frac{1}{\alpha_n}\left(\frac{\mu-\bar{y}}{\hat{\sigma}/\sqrt{N}}\right)^2\right]^{-(\alpha_n+1)/2} \end{align}

The fourth line is due to having the kernel of a inverted gamma density with $$N$$ degrees of freedom in the integral (Zellner 1996, p.~ 371).

The last expression is the kernel of a Student’s t density function with $$\alpha_n=N-1$$ degrees of freedom, expected value equal to $$\bar{y}$$, and variance $$\frac{\hat{\sigma}^2}{N}\left(\frac{\alpha_n}{\alpha_n-2}\right)$$. Then, $$\mu|\mathbf{y}\sim t\left(\bar{y},\frac{\hat{\sigma}^2}{N}\left(\frac{\alpha_n}{\alpha_n-2}\right),\alpha_n\right)$$.

Observe that a $$(1-\alpha)\%$$ confidence interval and $$(1-\alpha)\%$$ credible interval have exactly the same expression, $$\bar{y}\pm |t_{\alpha/2}^{\alpha_n}|\frac{\hat{\sigma}}{\sqrt{N}}$$, where $$t_{\alpha/2}^{\alpha_n}$$ is the $$\alpha/2$$ percentile of a Student’s t distribution. But again, the interpretations are totally different.

The mathematical similarity between the Frequentist and Bayesian expressions in this examples are due to using a non-informative improper prior.

### References

Zellner, Arnold. 1996. “Introduction to Bayesian Inference in Econometrics.”