2.6 A simple working example
We will illustrate some conceptual differences between the Bayesian and Frequentist statistical approaches by performing inference on a random sample Y=[Y1,Y2,…,YN], where Yiiid∼N(μ,σ2) for i=1,2,…,N.
In particular, we set π(μ,σ)=π(μ)π(σ)∝1σ. This is a standard non-informative improper prior (Jeffreys prior, see Chapter 3). That is, this prior is perfectly compatible with the sample information. Additionally, we assume independent priors for μ and σ.
π(μ,σ|y)∝1σ×(σ2)−N/2exp{−12σ2N∑i=1(yi−μ)2}=1σ×(σ2)−N/2exp{−12σ2N∑i=1((yi−ˉy)−(μ−ˉy))2}=1σexp{−N2σ2(μ−ˉy)2}×(σ)−Nexp{−12σ2N∑i=1(yi−ˉy)2}=1σexp{−N2σ2(μ−ˉy)2}×(σ)−(αn+1)exp{−αnˆσ22σ2},
where ˉy=∑Ni=1yiN, αn=N−1, and ˆσ2=∑Ni=1(yi−ˉy)2N−1.
The first term in the last expression is the kernel of a normal density, μ|σ,y∼N(ˉy,σ2/N). The second term is the kernel of an inverted gamma density (Zellner 1996), σ|y∼IG(αn,ˆσ2). Therefore,
π(μ|σ,y)=1√2πσ2/Nexp{−N2σ2(μ−ˉy)2},
and
π(σ|y)=2Γ(αn/2)(αnˆσ22)αn/21σαn+1exp{−αnˆσ22σ2}.
Observe that E[μ|σ,y]=ˉy; this is also the maximum likelihood (Frequentist) point estimate of μ in this setting. In addition, the Frequentist (1−α)% confidence interval and the Bayesian (1−α)% credible interval have exactly the same form, ˉy±|zα/2|σ√N, where zα/2 is the α/2 percentile of a standard normal distribution. However, the interpretations are entirely different. The confidence interval has a probabilistic interpretation under sampling variability of ˉY: in repeated sampling, (1−α)% of the intervals ˉY±|zα/2|σ√N would include μ. However, given an observed realization of ˉY, say ˉy, the probability of ˉy±|zα/2|σ√N including μ is either 1 or 0. This is why we refer to it as a (1−α)% confidence interval. On the other hand, ˉy±|zα/2|σ√N has a straightforward probabilistic interpretation in the Bayesian framework: there is a (1−α)% probability that μ lies within this interval.
If we want to get the marginal posterior density of μ,
π(μ|y)=∫∞0π(μ,σ|y)dσ∝∫∞01σ×(σ2)−N/2exp{−12σ2N∑i=1(yi−μ)2}dσ=∫∞0(1σ)N+1exp{−N2σ2∑Ni=1(yi−μ)2N}dσ=[2Γ(N/2)(N∑Ni=1(yi−μ)22N)N/2]−1∝[N∑i=1(yi−μ)2]−N/2=[N∑i=1((yi−ˉy)−(μ−ˉy))2]−N/2=[αnˆσ2+N(μ−ˉy)2]−N/2∝[1+1αn(μ−ˉyˆσ/√N)2]−(αn+1)/2.
The fourth line arises from the kernel of an inverted gamma density with N degrees of freedom in the integral (Zellner 1996).
The last expression represents the kernel of a Student’s t-distribution with αn=N−1 degrees of freedom, expected value equal to ˉy, and variance ˆσ2N(αnαn−2). Therefore, μ|y∼t(ˉy,ˆσ2N(αnαn−2),αn).
Observe that a (1−α)% confidence interval and a (1−α)% credible interval have exactly the same form, ˉy±|tαnα/2|ˆσ√N, where tαnα/2 is the α/2 percentile of a Student’s t-distribution. However, the interpretations are entirely different.
The mathematical similarity between the Frequentist and Bayesian expressions in this example arises from the use of an improper prior.
Example: Math test
You have a random sample of math scores of size N=50 from a normal distribution, Yi∼N(μ,σ2). The sample mean and variance are equal to 102 and 10, respectively. Assuming an improper prior equal to 1σ, we proceed with the following tasks:
- Compute the 95% confidence and credible intervals for μ.
- Determine the posterior probability that μ>103.
Using the fact that μ|y∼t(ˉy,ˆσ2N(αnαn−2),αn), which implies that the confidence and credible intervals for μ are given by:
ˉy±|tαnα/2|ˆσ√N,
where ˉy=102, ˆσ2=10, and αn=49. Thus, the 95% confidence and credible intervals for μ are the same, namely (101.1,102.9), and the posterior probability that μ>103 is 1.49% given the sample information.
N <- 50 # Sample size
y_bar <- 102 # Sample mean
s2 <- 10 # Sample variance
alpha <- N - 1
serror <- (s2/N)^0.5
LimInf <- y_bar - abs(qt(0.025, alpha)) * serror
LimInf
## [1] 101.1013
## [1] 102.8987
# Upper bound
y.cut <- 103
P <- 1-metRology::pt.scaled(y.cut, df = alpha, mean = y_bar, sd = serror)
P
## [1] 0.01496694