Chapter 2 Updating form of the posterior distribution

Assume \(y \sim N\left( X \beta, \sigma^{2} V\right)\) and \(P(\beta, \sigma^{2}) = NIG \left(\beta, \sigma^{2} \mid m_{0}, M_{0}, a_{0}, b_{0}\right)\), the posterior distribution is given by

\[\begin{align} P\left(\beta, \sigma^{2} \mid y\right) = NIG\left(\beta, \sigma^{2} \mid M_{1}m_{1}, M_{1}, a_{1}, b_{1}\right) \; \end{align}\]

where

\[\begin{align} M_{1} &= (M_{0}^{-1}+X^{\top} V^{-1} X)^{-1} \;; \\ m_{1}&=M_{0}^{-1} m_{0}+X^{\top} V^{-1} y \;; \\ a_{1}&=a_{0}+\frac{p}{2} \;; \\ b_{1}&= b_{0}+\frac{1}{2}\left(m_{0}^{\top} M_{0}^{-1} m_{0}+y^{\top} V^{-1} y-m_{1}^{\top} M_{1} m_{1}\right)\;. \end{align}\]

We will use two ways to calculate \(M_1\).

2.1 Method 1: Sherman-Woodbury-Morrison identity

Theorem 2.1 (Sherman-Woodbury-Morrison identity) We have \[\begin{equation}\label{ShermanWoodburyMorrison} \left(A + BDC\right)^{-1} = A^{-1} - A^{-1}B\left(D^{-1}+CA^{-1}B\right)^{-1}CA^{-1} \end{equation}\] where \(A\) and \(D\) are square matrices that are invertible and \(B\) and \(C\) are rectangular (square if \(A\) and \(D\) have the same dimensions) matrices such that the multiplications are well-defined.

Sherman-Woodbury-Morrison identity is easily verified by multiplying the right hand side with \(A + BDC\) and simplifying to reduce it to the identity matrix. Using this formula, we have

\[ \begin{aligned} M_1 & = (M_{0}^{-1} + X^{\top}V^{-1}X)^{-1} \\ & = M_0-M_0 X^{\top}\left(V+X M_0 X^{\top}\right)^{-1} X M_0 \\ & = M_0-M_0 X^{\top} Q^{-1} X M_0 \end{aligned} \]

where \(Q = V + X M_0 X^{\top}\)

We can show that \[ \begin{align} M_1 m_1 & =m_0+M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \;. \end{align} \]

Click to show or hide details

\[\begin{align} M_1 m_1 & = \left(M_0^{-1}+X^{\top} V^{-1} X\right)^{-1} m_1 \\ & = [M_0-M_0 X^{\top}\left(V+X M_0 X^{\top}\right)^{-1} X M_0]m_1 \\ & = (M_0-M_0 X^{\top} Q^{-1} X M_0) m_1 \\ & = (M_0-M_0 X^{\top} Q^{-1} X M_0)(M_0^{-1} m_0+X^{\top} V^{-1} y) \\ & = m_0+M_0 X^{\top} V^{-1} y-M_0 X^{\top} Q^{-1} X m_0 - M_0 X^{\top} Q^{-1} X M_0 X^{\top} V^{-1} y \\ & = m_0+M_0 X^{\top}\left(I-Q^{-1} X M_0 X^{\top}\right) V^{-1} y - M_0 X^{\top} Q^{-1} X m_0 \\ & = m_0+M_0 X^{\top} Q^{-1}\left(Q-X M_0 X^{\top}\right)V^{-1} y - M_0 X^{\top} Q^{-1} X m_0 \\ & \left(\text { since } Q=V+X M_0 X^{\top}\right) \\ & = m_0+M_0 X^{\top} Q^{-1}(V) V^{-1} y-M_0 X^{\top} Q^{-1} X m_0 \\ & = m_0+M_0 X^{\top} Q^{-1} y-M_0 X^{\top} Q^{-1} X m_0 \\ & = m_0+M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ \end{align}\]

Furthermore, we can simplify that \[ \begin{align} m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} M_1 m_1 & = \left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \;. \end{align} \]

Click to show or hide details

\[\begin{align} m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} M_1 m_1 & = m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} [m_0+M_0 X^{\top} Q^{-1} (y - X m_0)] \\ & = m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y-m_1^{\top} m_0 - m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & = m_0^{\top} M_0^{-1} m_0+y^{\top} V^{-1} y -m_0^{\top}\left(M_0^{-1} m_0+X^{\top} V^{-1} y\right) \\ & \qquad \qquad \qquad - m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & = y^{\top} V^{-1} y-y^{\top} V^{-1} X m_0 - m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & = y^{\top} V^{-1}\left(y-X m_0 \right)-m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\underbrace{m_1^{\top} M_0 X^{\top} Q^{-1}\left(y-X m_0\right)}_{\substack{\text { simplify from left to right }}} \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\left(M_0 m_1\right)^{\top} X^{\top} Q^{-1}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\left(m_0+M_0 X^{\top} V^{-1} y\right)^{\top} X^{\top} Q^{-1}\left(y-m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-\left(X m_0+X M_0 X^{\top} V^{-1} y\right)^{\top} Q^{-1}\left(y-X m_0\right)\\ & =y^{\top} V^{-1}\left(y-X m_0\right) -\left(Q^{-1} X m_0+Q^{-1}\left(X M_0 X^{\top}\right)V^{-1} y\right)\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-[Q^{-1} X m_0+Q^{-1}(Q-V) V^{-1} y]^{\top}(y-X m_0) \\ & =y^{\top} V^{-1}\left(y-X m_0\right) -\left(Q^{-1} X m_0+V^{-1} y- Q^{-1} y \right)^{\top}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right) -[V^{-1} y+Q^{-1}\left(X m_0-y\right)]^{\top}\left(y-X m_0\right) \\ & =y^{\top} V^{-1}\left(y-X m_0\right)-y^{\top} V^{-1}\left(y-X m_0\right) +\left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \\ & =\left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \\ \end{align}\]

So, we get the following updating form of the posterior distribution from Bayesian linear regression

\[ \begin{aligned} P\left(\beta, \sigma^{2} \mid y\right) = NIG\left(\beta, \sigma^{2} \mid \tilde{m}_1, \tilde{M}_1, a_{1}, b_{1}\right) \; \end{aligned} \] where

\[ \begin{aligned} \tilde{m}_1 & =M_1 m_1=m_0+M_0 X^{\top} Q^{-1}\left(y-X m_0\right) \\ \tilde{M}_1 & =M_1=M_0-M_0 X^{\top} Q^{-1} X M_0 \\ a_1 & =a_0+\frac{p}{2} \\ b_1 & =b_0+\frac{1}{2}\left(y-X m_0\right)^{\top} Q^{-1}\left(y-X m_0\right) \\ Q & =V+X M_0 X^{\top} \end{aligned} \]

2.2 Method 2: distribution theory

Previously, we got the Bayesian Linear Regression Updater using Sherman-Woodbury-Morrison identity. Here, we will derive the results without resorting to it. The model is given by \[\begin{align} & y=X \beta+\epsilon , \quad \epsilon \sim N\left(0, \sigma^2 V\right) ; \\ & \beta=m_0+\omega , \quad \omega \sim N\left(0, \sigma^2 M_0\right) ; \\ & \sigma^2 \sim I G\left(a_0, b_0\right) \;. \end{align}\]

This corresponds to the posterior distribution

\[\begin{align} P\left(\beta, \sigma^2 \mid y\right) \propto I G\left(\sigma^2 \mid a_0, b_0\right) & \times N\left(\beta \mid m_0, \sigma^2 M_0\right) \times N\left(y \mid X \beta, \sigma^2 V\right) \;. \end{align}\]

We will derive \(P\left(\sigma^2 \mid y\right)\) and \(P\left(\beta \mid \sigma^2, y\right)\) in a form that will reflect updates from the prior to the posterior.

Integrating out \(\beta\) from the model is equivalent to substituting \(\beta\) from its prior model. Thus, \(P\left(y \mid \sigma^2\right)\) is derived simply from \[\begin{align} y &=X \beta+\epsilon \\ &=X\left(m_0+\omega\right)+\epsilon \\ &=X m_0 + X \omega + \epsilon \\ & =X m_0+ \eta \;, \\ \end{align}\]

where \[\begin{align} & \eta = X \omega + \epsilon \sim N\left(0, \sigma^2Q\right) \; ; \\ & Q=X M_0 X^{\top}+V \; . \\ \end{align}\]

Therefore, \[\begin{align} y \mid \sigma^2 \sim N\left(X m_0, \sigma^2 Q\right) \; .\\ \end{align}\]

The posterior distribution is given by: \[\begin{align} P\left(\sigma^2 \mid y\right) & \propto P\left(\sigma^2\right) P\left(y \mid \sigma^2\right) \\ & =I G\left(\sigma^2 \mid a_0, b_0\right) \times N\left(y \mid X m_0, \sigma^2 Q\right) \\ & \propto\left(\frac{1}{\sigma^2}\right)^{a_0+1} e^{-\frac{b_0} {\sigma^2} \times\left(\frac{1}{\sigma^2}\right)^{\frac{n}{2}} e^{-\frac{1}{2 \sigma^2}}\left(y-Xm_0\right)^{\top} Q^{-1}\left(y-Xm_0\right)} \\ & \propto\left(\frac{1}{\sigma^2}\right)^{a_0+\frac{p}{2}+1} e^{-\frac{1}{\sigma^2}\left\{b_0+\frac{1}{2}\left(y-Xm_0\right)^{\top} Q^{-1}\left(y-Xm_0\right)\right.} \\ & \propto IG \left(\sigma^2 \mid a_1, b_1\right) \; , \end{align}\]

where \[\begin{align} & a_1 = a_0 + \frac{p}{2} ; \\ & b_1 = b_0 + \frac{1}{2} (y-Xm_0)^{\top} Q^{-1} \left(y-Xm_0\right) \; . \end{align}\]

Next, we turn to \(P\left(\beta \mid \sigma^2, y\right)\). Note that \[ \left[\begin{array}{l} y \\ \beta \end{array}\right] \mid \sigma^2 \sim N\left(\left[\begin{array}{l} Xm_0 \\ m_0 \end{array}\right], \quad \sigma^2 \left[\begin{array}{cc} Q & X M_0 \\ M_0 X^{\top} & M_0 \end{array}\right]\right) \; . \]
Click to show or hide details

where we have used the facts

\[ \begin{aligned} & E[y \mid \sigma^2] = Xm_0 \; ;\\ & E[\beta \mid \sigma^2] = m_0 \; ; \\ & \operatorname{Var}\left(y \mid \sigma^2\right)=\sigma^2 Q \; ; \\ & \operatorname{Var}\left(\beta \mid \sigma^2\right)=\sigma^2 M_0 \; ; \\ \end{aligned} \]

\[ \begin{aligned} \operatorname{Cov}\left(y, \beta \mid \sigma^2\right) &= \operatorname{Cov}\left(X \beta+\epsilon, \beta \mid \sigma^2\right) \\ & =\operatorname{Cov}\left(X\left(m_0+\omega\right)+\epsilon, m_0+\omega \mid \sigma^2\right) \\ & =\operatorname{Cov}\left(X \omega, \omega \mid \sigma^2\right) \\ & \quad \text {(Since } m_0 \text { is constant and } \operatorname{Cov}(\omega, \epsilon)=0) \\ & =\sigma^2 X M_0 \; . \end{aligned} \]

From the expression of a conditional distribution derived from a multivariate Gaussian, we obtain \[ \beta \mid \sigma^2, y \sim N\left(\tilde{m}_1, \sigma^2 \tilde{M}_1\right) \;, \]

where \[\begin{align} & \tilde{m}_1=E\left[\beta \mid \sigma^2, y\right]=m_0+M_0 X^{\top} Q^{-1}\left(y-X{m_0}\right) \; ;\\ & \tilde{M}_1=M_0-M_0 X^{\top} Q^{-1} X M_0 \; . \\ \end{align}\]

Click to show or hide details

Note:

\[\begin{align} & \left[\begin{array}{l} X_1 \\ X_2 \end{array}\right] \sim N\left(\left[\begin{array}{l} \mu_1 \\ \mu_2 \end{array}\right],\left[\begin{array}{ll} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array}\right]\right) \text { with } \Sigma_{21} = \Sigma_{12}^{\top} \;, \\ & \Rightarrow X_2 \mid X_1 \sim N\left(\mu_{2 \cdot 1}, \Sigma_{2 \cdot 1}\right) \;, \\ & \text {where } \mu_{2 \cdot 1}= \mu_2+\Sigma_{21} \Sigma_{11}^{-1}\left(X_1-\mu_1\right) \text { and } \Sigma_{2 \cdot 1}=\Sigma_{22}-\Sigma_{21} \Sigma_{11}^{-1} \Sigma_{12} \;. \end{align}\]