4.3 Linear regression: The conjugate normal-normal/inverse gamma model

In this setting we analyze the conjugate normal-normal/inverse gamma model which is the workhorse in econometrics. In this model, the dependent variable yi is related to a set of regressors xi=(xi1,xi2,,xiK) in a linear way, that is, yi=β1xi1+β2xi2++βKxiK+μi=xiβ+μi where β=(β1,β2,,βK) and μiiidN(0,σ2) is an stochastic error that is independent of the regressors, xiμi.

Defining y=[y1y2yN], X=[x11x12x1Kx21x22x2KxN1xN2xNK] and μ=[μ1μ2μN], we can write the model in matrix form: y=Xβ+μ, where μN(0,σ2I) which implies that yN(Xβ,σ2I). Then, the likelihood function is


The conjugate priors for the parameters are β|σ2N(β0,σ2B0),σ2IG(α0/2,δ0/2).

Then, the posterior distribution is


where ˆβ=(XX)1Xy is the maximum likelihood estimator.

Adding and subtracting βnB1nβn to complete the square, where Bn=(B10+XX)1 and βn=Bn(B10β0+XXˆβ),


The first expression is the kernel of a normal density function, β|σ2,y,XN(βn,σ2Bn). The second expression is the kernel of a inverse gamma density, σ2|y,XIG(αn/2,δn/2), where αn=α0+N and δn=δ0+yy+β0B10β0βnB1nβn.

Taking into account that βn=(B10+XX)1(B10β0+XXˆβ)=(B10+XX)1B10β0+(B10+XX)1XXˆβ,

where (B10+XX)1B10=IK(B10+XX)1XX (Smith 1973). Setting W=(B10+XX)1XX we have βn=(IKW)β0+Wˆβ, that is, the posterior mean of β is a weighted average between the sample and prior information, where the weights depend on the precision of each piece of information. Observe that when the prior covariance matrix is highly vague (non–informative), such that B100K, we obtain WIK, such that βnˆβ, that is, the posterior mean location parameter converges to the maximum likelihood estimator.

In addition, we know that the posterior conditional covariance matrix of the location parameters σ2(B10+XX)1=σ2(XX)1σ2((XX)1(B0+(XX)1)1(XX)1) is positive semi-definite.17 Given that σ2(XX)1 is the covariance matrix of the maximum likelihood estimator, we observe that prior information reduces estimation uncertainty.

Now, we calculate the posterior marginal distribution of β,

π(β|y,X)=0π(β,σ2|y,X)dσ2=0(1σ2)αn+K2+1exp{s2σ2}dσ2, where s=δn+(ββn)B1n(ββn). Then we can write π(β|y,X)=0(1σ2)αn+K2+1exp{s2σ2}dσ2=Γ((αn+K)/2)(s/2)(αn+K)/20(s/2)(αn+K)/2Γ((αn+K)/2)(σ2)(αn+K)/21exp{s2σ2}dσ2.

The right term is the integral of the probability density function of an inverse gamma distribution with parameters ν=(αn+K)/2 and τ=s/2. Since we are integrating over the whole support of σ2, the integral is equal to 1, and therefore π(β|y,X)=Γ((αn+K)/2)(s/2)(αn+K)/2s(αn+K)/2=[δn+(ββn)B1n(ββn)](αn+K)/2=[1+(ββn)(δnαnBn)1(ββn)αn](αn+K)/2(δn)(αN+K)/2[1+(ββn)H1n(ββn)αn](αn+K)/2, where Hn=δnαnBn. This last expression is a multivariate Student’s t distribution for β, β|y,XtK(αn,βn,Hn).

Observe that as we have incorporated the uncertainty of the variance, the posterior for β changes from a normal to a Students’ t distribution, which has heavier tails.

The marginal likelihood of this model is


Taking into account that (yXβ)(yXβ)+(ββ0)B10(ββ0)=(ββn)B1n(ββn)+m, where m=yy+β0B10β0βnB1nβn, we have that


The posterior predictive is equal to


where we take into account independence between Y0 and Y. Given X0, which is the N0×K matrix of regressors associated with Y0, Then,


Setting M=(X0X0+B1n) and β=M1(B1nβn+X0Y0), we have



π(Y0|y)0{(1σ2)K+N0+αn2+1exp{12σ2(βnB1nβn+Y0Y0βMβ+δn)}×RKexp{12σ2(ββ)M(ββ)}dβ}dσ2, where the term in the second integral is the kernel of a multivariate normal density with mean β and covariance matrix σ2M1. Then,


which is the kernel of an inverse gamma density. Thus,


Setting C1=IN0+X0BnX0 such that C=IN0X0(B1n+X0X0)1X0=IN0X0M1X0,18 and β=C1X0M1B1nβn, then


where βn(B1nB1nM1B1n)βn=βCβ and β=X0βn (see Exercise 6).



Then, the posterior predictive is a multivariate Student’s t, Y0|yt(X0βn,δn(IN0+X0BnX0)αn,αn).

  1. A particular case of the Woodbury matrix identity↩︎

  2. Using this result (A+BDC)1=A1A1B(D1+CA1B)1CA1↩︎