2.6 A simple working example
We will illustrate some conceptual differences between the Bayesian and Frequentist statistical approaches performing inference given a random sample \(\mathbf{y}=[y_1,y_2,\dots,y_N]\), where \(y_i\stackrel{iid}{\sim} N(\mu, \sigma^2)\), \(i=1,2,\dots,N\).
In particular, we set \(\pi(\mu,\sigma)=\pi(\mu)\pi(\sigma)\propto \frac{1}{\sigma}\). This is a standard non-informative improper prior (Jeffreys prior, see Chapter 3), that is, this prior is perfectivelly compatible with sample information. In addition, we are assuming independent priors for \(\mu\) and \(\sigma\). Then,
\[\begin{align} \pi(\mu,\sigma|\mathbf{y})&\propto \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\}\\ &= \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N ((y_i-\bar{y}) - (\mu-\bar{y}))^2\right\}\\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-N}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\bar{y})^2\right\}\\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-(\alpha_n+1)}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}, \end{align}\]
where \(\bar{y}=\frac{\sum_{i=1}^N}{N}\), \(\alpha_n=N-1\) and \(\hat{\sigma}^2=\frac{\sum_{i=1}^N (y_i-\bar{y})^2}{N-1}\).
The first term in the last expression is the kernel of a normal density, \(\mu|\sigma,\mathbf{y}\sim N(\bar{y},\sigma^2/N)\). The second term is the kernel of an inverted gamma density (Zellner 1996, p.~ 371), \(\sigma|\mathbf{y}\sim IG(\alpha_n,\hat{\sigma}^2)\). Therefore, \(\pi(\mu|\sigma,\mathbf{y})=(2\pi\sigma^2/N)^{-1/2}\exp\left\{\frac{-N}{2\sigma^2}(\mu-\bar{y})^2\right\}\) and \(\pi(\sigma|\mathbf{y})=\frac{2}{\Gamma(\alpha_n/2)}\left(\frac{\alpha_n\hat{\sigma}^2}{2}\right)^{\alpha_n/2}\frac{1}{\sigma^{\alpha_n+1}}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}\).
Observe that \(\mathbb{E}[\mu|\sigma,\mathbf{y}]=\bar{y}\), this is also the maximum likelihood (Frequentist) point estimate of \(\mu\) in this setting. In addition, the Frequentist \((1-\alpha)\%\) confidence interval and the Bayesian \((1-\alpha)\%\) credible interval have exactly the same form, \(\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{\sqrt{N}}\), where \(z_{\alpha/2}\) is the \(\alpha/2\) percentile of a standard normal distribution. However, the interpretations are totally different. The confidence interval has a probabilistic interpretation under sampling variability of \(\bar{Y}\), that is, in repeated sampling \((1-\alpha)\%\) of the intervals \(\bar{Y}\pm |z_{\alpha/2}|\frac{\sigma}{\sqrt{N}}\) would include \(\mu\), but given an observed realization of \(\bar{Y}\), say \(\bar{y}\), the probability of \(\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{\sqrt{N}}\) including \(\mu\) is 1 or 0, that is why we say a \((1-\alpha)\%\) confidence interval. On the other hand, \(\bar{y}\pm |z_{\alpha/2}|\frac{\sigma}{\sqrt{N}}\) has a simple probabilistic interpretation in the Bayesian framework, there is a \((1-\alpha)\%\) probability that \(\mu\) lies in this interval.
If we want to get the marginal posterior density of \(\mu\),
\[\begin{align} \pi(\mu|\mathbf{y})&=\int_{0}^{\infty} \pi(\mu,\sigma|\mathbf{y}) d\sigma\\ &\propto \int_{0}^{\infty} \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} d\sigma\\ &= \int_{0}^{\infty} \left(\frac{1}{\sigma}\right)^{N+1} \exp\left\{-\frac{N}{2\sigma^2}\frac{\sum_{i=1}^N (y_i-\mu)^2}{N}\right\} d\sigma\\ &=\left[\frac{2}{\Gamma(N/2)}\left(\frac{N\sum_{i=1}^N (y_i-\mu)^2}{2N}\right)^{N/2}\right]^{-1}\\ &\propto \left[\sum_{i=1}^N (y_i-\mu)^2\right]^{-N/2}\\ &=\left[\sum_{i=1}^N ((y_i-\bar{y})-(\mu-\bar{y}))^2\right]^{-N/2}\\ &=[\alpha_n\hat{\sigma}^2+N(\mu-\bar{y})^2]^{-N/2}\\ &\propto \left[1+\frac{1}{\alpha_n}\left(\frac{\mu-\bar{y}}{\hat{\sigma}/\sqrt{N}}\right)^2\right]^{-(\alpha_n+1)/2} \end{align}\]
The fourth line is due to having the kernel of a inverted gamma density with \(N\) degrees of freedom in the integral (Zellner 1996, p.~ 371).
The last expression is the kernel of a Student’s t density function with \(\alpha_n=N-1\) degrees of freedom, expected value equal to \(\bar{y}\), and variance \(\frac{\hat{\sigma}^2}{N}\left(\frac{\alpha_n}{\alpha_n-2}\right)\). Then, \(\mu|\mathbf{y}\sim t\left(\bar{y},\frac{\hat{\sigma}^2}{N}\left(\frac{\alpha_n}{\alpha_n-2}\right),\alpha_n\right)\).
Observe that a \((1-\alpha)\%\) confidence interval and \((1-\alpha)\%\) credible interval have exactly the same expression, \(\bar{y}\pm |t_{\alpha/2}^{\alpha_n}|\frac{\hat{\sigma}}{\sqrt{N}}\), where \(t_{\alpha/2}^{\alpha_n}\) is the \(\alpha/2\) percentile of a Student’s t distribution. But again, the interpretations are totally different.
The mathematical similarity between the Frequentist and Bayesian expressions in this examples are due to using a non-informative improper prior.
Example: Math test
You have a random sample of math scores of size \(N=50\) from a normal distribution, \(Y_i\sim \mathcal{N}(\mu, \sigma)\). The sample mean and variance are equal to \(102\) and \(10\), respectively. Assuming an improper prior equal to \(1/\sigma\),
Get a 95% confidence and credible intervals for \(\mu\).
What is the posterior probability that \(\mu > 103\)?
<- 50
N # Sample size
<- 102
y_bar #sample mean
<- 10
s2 #sample variance
<- N - 1
alpha <- (s2/N)^0.5
serror <- y_bar - abs(qt(0.025, alpha)) * serror
LimInf # Lower bound
<- y_bar + abs(qt(0.025, alpha)) * serror
LimSup # Upper bound
paste("The 95% credible and confidence intervals are the same", "(", LimInf, LimSup, ").", "However, their interpretations are totally different", sep = " ")
## [1] "The 95% credible and confidence intervals are the same ( 101.101290632776 102.898709367224 ). However, their interpretations are totally different"
<- 103
y.cut <- 1-metRology::pt.scaled(y.cut, df = alpha, mean = y_bar, sd = serror)
P paste("The probability that mu is greater than 103 is", P, sep = " ")
## [1] "The probability that mu is greater than 103 is 0.0149669408866644"
# Probability of mu greater than y.cut