4.4 Multivariate linear regression: The conjugate normal-normal/inverse Wishart model
Let’s study the multivariate regression setting where there are M N-dimensional vectors ym, m=1,2,…,M such that ym=Xβm+μm, X is the set of common regressors, and μm is the N-dimensional vector of stochastic errors for each equation such that U=[μ1 μ2 … μM]∼MNN,M(0,IN,Σ), that is, a matrix variate normal distribution where Σ is the covariance matrix of each i-th row of U, i=1,2,…,N, and we are assuming independece between the rows. Then, vec(U)∼NN×M(0,Σ⊗IN).19
This framework can be written in matricial form
[y11y12…y1My21y22…y2M⋮⋮…⋮yN1yN2…yNM]⏟Y=[x11x12…x1Kx21x22…x2K⋮⋮…⋮xN1xN2…xNK]⏟X[β11β12…β1Mβ21β22…β2M⋮⋮…⋮βK1βK2…βKM]⏟B+[μ11μ12…μ1Mμ21μ22…μ2M⋮⋮…⋮μN1μN2…μNM]⏟U
Therefore, Y∼NN×M(XB,Σ⊗IN),20
p(Y|B,Σ)∝|Σ|−N/2exp{−12tr[(Y−XB)⊤(Y−XB)Σ−1]}=|Σ|−N/2exp{−12tr[S+(B−ˆB)⊤X⊤X(B−ˆB)]Σ−1},
where S=(Y−XˆB)⊤(Y−XˆB), ˆB=(X⊤X)−1X⊤Y (see Exercise 7).
The conjugate prior for this models is π(B,Σ)=π(B|Σ)π(Σ) where π(B|Σ)∼NK×M(B0,Σ⊗V0) and π(Σ)∼IW(Ψ0,α0), that is,
π(B,Σ)∝|Σ|−K/2exp{−12tr[(B−B0)⊤V−10(B−B0)]Σ−1}×|Σ|−(α0+M+1)/2exp{−12tr[Ψ0Σ−1]}.
The posterior distribution is given by
π(B,Σ|Y,X)∝p(Y|B,Σ,X)π(B|Σ)π(Σ)∝|Σ|−N+K+α0+M+12×exp{−12tr[(Ψ0+S+(B−B0)⊤V−10(B−B0)+(B−ˆB)⊤X⊤X(B−ˆB))Σ−1]}. Completing the squares on B and collecting the remaining terms in the bracket yields Ψ0+S+(B−B0)⊤V−10(B−B0)+(B−ˆB)⊤X⊤X(B−ˆB)=(B−Bn)⊤V−1n(B−Bn)+Ψn, where Bn=(V−10+X⊤X)−1(V−10B0+X⊤Y)=(V−10+X⊤X)−1(V−10B0+X⊤XˆB),Vn=(V−10+X⊤X)−1,Ψn=Ψ0+S+B⊤0V−10B0+ˆB⊤X⊤XˆB−B⊤nV−1nBn. Thus, the posterior distribution can be written as π(B,Σ|Y,X)∝|Σ|−K/2exp{−12tr[(B−Bn)⊤V−1n(B−Bn)]Σ−1}×|Σ|−N+α0+M+12exp{−12tr[ΨnΣ−1]}. That is π(B,Σ|Y,X)=π(B|Σ,Y,X)π(Σ|Y,X) where π(B|Σ,Y,X)∼NK×M(Bn,Σ⊗Vn) and π(Σ|Y,X)∼IW(Ψn,αn), where αn=N+α0.
The marginal posterior for B is …
The marginal likelihood is …
The predictive density is …