Chapter 2 Updating form of the posterior distribution

Assume yN(Xβ,σ2V) and P(β,σ2)=NIG(β,σ2m0,M0,a0,b0), the posterior distribution is given by

P(β,σ2y)=NIG(β,σ2M1m1,M1,a1,b1)

where

M1=(M10+XV1X)1;m1=M10m0+XV1y;a1=a0+p2;b1=b0+12(m0M10m0+yV1ym1M1m1).

We will use two ways to calculate M1.

2.1 Method 1: Sherman-Woodbury-Morrison identity

Theorem 2.1 (Sherman-Woodbury-Morrison identity) We have (A+BDC)1=A1A1B(D1+CA1B)1CA1 where A and D are square matrices that are invertible and B and C are rectangular (square if A and D have the same dimensions) matrices such that the multiplications are well-defined.

Sherman-Woodbury-Morrison identity is easily verified by multiplying the right hand side with A+BDC and simplifying to reduce it to the identity matrix. Using this formula, we have

M1=(M10+XV1X)1=M0M0X(V+XM0X)1XM0=M0M0XQ1XM0

where Q=V+XM0X

We can show that M1m1=m0+M0XQ1(yXm0).

Click to show or hide details

M1m1=(M10+XV1X)1m1=[M0M0X(V+XM0X)1XM0]m1=(M0M0XQ1XM0)m1=(M0M0XQ1XM0)(M10m0+XV1y)=m0+M0XV1yM0XQ1Xm0M0XQ1XM0XV1y=m0+M0X(IQ1XM0X)V1yM0XQ1Xm0=m0+M0XQ1(QXM0X)V1yM0XQ1Xm0( since Q=V+XM0X)=m0+M0XQ1(V)V1yM0XQ1Xm0=m0+M0XQ1yM0XQ1Xm0=m0+M0XQ1(yXm0)

Furthermore, we can simplify that m0M10m0+yV1ym1M1m1=(yXm0)Q1(yXm0).

Click to show or hide details

m0M10m0+yV1ym1M1m1=m0M10m0+yV1ym1[m0+M0XQ1(yXm0)]=m0M10m0+yV1ym1m0m1M0XQ1(yXm0)=m0M10m0+yV1ym0(M10m0+XV1y)m1M0XQ1(yXm0)=yV1yyV1Xm0m1M0XQ1(yXm0)=yV1(yXm0)m1M0XQ1(yXm0)=yV1(yXm0)m1M0XQ1(yXm0) simplify from left to right =yV1(yXm0)(M0m1)XQ1(yXm0)=yV1(yXm0)(m0+M0XV1y)XQ1(ym0)=yV1(yXm0)(Xm0+XM0XV1y)Q1(yXm0)=yV1(yXm0)(Q1Xm0+Q1(XM0X)V1y)(yXm0)=yV1(yXm0)[Q1Xm0+Q1(QV)V1y](yXm0)=yV1(yXm0)(Q1Xm0+V1yQ1y)(yXm0)=yV1(yXm0)[V1y+Q1(Xm0y)](yXm0)=yV1(yXm0)yV1(yXm0)+(yXm0)Q1(yXm0)=(yXm0)Q1(yXm0)

So, we get the following updating form of the posterior distribution from Bayesian linear regression

P(β,σ2y)=NIG(β,σ2˜m1,˜M1,a1,b1) where

˜m1=M1m1=m0+M0XQ1(yXm0)˜M1=M1=M0M0XQ1XM0a1=a0+p2b1=b0+12(yXm0)Q1(yXm0)Q=V+XM0X

2.2 Method 2: distribution theory

Previously, we got the Bayesian Linear Regression Updater using Sherman-Woodbury-Morrison identity. Here, we will derive the results without resorting to it. The model is given by y=Xβ+ϵ,ϵN(0,σ2V);β=m0+ω,ωN(0,σ2M0);σ2IG(a0,b0).

This corresponds to the posterior distribution

P(β,σ2y)IG(σ2a0,b0)×N(βm0,σ2M0)×N(yXβ,σ2V).

We will derive P(σ2y) and P(βσ2,y) in a form that will reflect updates from the prior to the posterior.

Integrating out β from the model is equivalent to substituting β from its prior model. Thus, P(yσ2) is derived simply from y=Xβ+ϵ=X(m0+ω)+ϵ=Xm0+Xω+ϵ=Xm0+η,

where η=Xω+ϵN(0,σ2Q);Q=XM0X+V.

Therefore, yσ2N(Xm0,σ2Q).

The posterior distribution is given by: P(σ2y)P(σ2)P(yσ2)=IG(σ2a0,b0)×N(yXm0,σ2Q)(1σ2)a0+1eb0σ2×(1σ2)n2e12σ2(yXm0)Q1(yXm0)(1σ2)a0+p2+1e1σ2{b0+12(yXm0)Q1(yXm0)IG(σ2a1,b1),

where a1=a0+p2;b1=b0+12(yXm0)Q1(yXm0).

Next, we turn to P(βσ2,y). Note that [yβ]σ2N([Xm0m0],σ2[QXM0M0XM0]).
Click to show or hide details

where we have used the facts

E[yσ2]=Xm0;E[βσ2]=m0;Var(yσ2)=σ2Q;Var(βσ2)=σ2M0;

Cov(y,βσ2)=Cov(Xβ+ϵ,βσ2)=Cov(X(m0+ω)+ϵ,m0+ωσ2)=Cov(Xω,ωσ2)(Since m0 is constant and Cov(ω,ϵ)=0)=σ2XM0.

From the expression of a conditional distribution derived from a multivariate Gaussian, we obtain βσ2,yN(˜m1,σ2˜M1),

where ˜m1=E[βσ2,y]=m0+M0XQ1(yXm0);˜M1=M0M0XQ1XM0.

Click to show or hide details

Note:

[X1X2]N([μ1μ2],[Σ11Σ12Σ21Σ22]) with Σ21=Σ12,X2X1N(μ21,Σ21),where μ21=μ2+Σ21Σ111(X1μ1) and Σ21=Σ22Σ21Σ111Σ12.