Chapter 2 Single tree model

2.1 Model specification

Define the following model. Suppose we have the observation of a tree node as: \[y_{ij}, i = 1,\ldots,n_j, \; j = 1\ldots, m\] where \(y_{ij}\) is observation \(i\) in group \(j\). There are different numbers of observations \(n_j\) in each group, so for example \(n_1\) is the number of observations in group 1, etc. There are \(m\) groups. The total number of observations is \(n = \sum_{j=1}^m n_j\)

Then, for each tree node, suppose we have the likelihood: \[y_{ij} \sim N(\mu_j, \tau^{-1})\]

so each group has an overall mean \(\mu_j\), with an overall precision term \(\tau\).

We then have a hierarchical prior distribution:

\[\mu_j \sim N(\mu, k_1 (\tau^{-1}))\]

where \(k_1\) will be taken as a constant for simplicity, and the hyper-parameter prior distributions are:

\[\mu \sim N(0, \tau_{\mu} = k_2 (\tau^{-1}))\] \[\tau \sim Ga(\alpha, \beta)\]

Where the values \(k_1, k_2, \alpha, \beta\) are all fixed.

2.2 Maths

  • The likelihood of each tree node will be:

\[\begin{equation} L = \prod_{j = 1}^{m} \prod_{i = 1}^{n_j} p(y_{ij} | \mu_{j}, \tau) \\ L \propto \tau^{n/2} exp \{ -\frac{\tau}{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \} \end{equation}\]

with prior distributions:

  • \(\mu_j | \mu, \tau, k_1\)

\[\begin{equation} p(\mu_1, \dots, \mu_m | \mu, \tau) \propto (\tau/k_1)^{m/2} exp\{ - \frac{\tau}{2k_1} \sum_{j = 1}^{m} (\mu_j - \mu)^2 \} \end{equation}\]

  • \(\tau | \alpha, \beta\)

\[p(\tau | \alpha, \beta) \propto \tau^{\alpha - 1} exp\{ - \beta \tau \}\]

  • \(\mu | \tau_{\mu} = k_2 (\tau^{-1})\)

\[\begin{equation} p(\mu | k_2, \tau) \propto (\tau/k_2)^{1/2} exp\{ - \frac{ \tau}{2 k_2} \mu^2 \} \} \end{equation}\]

and their joint distribution as:

  • \(p(\tau, \mu_1, \dots, \mu_m, \mu| y, k_1, k_2, \tau, \alpha, \beta)\)

\[\begin{equation} p(\tau, \mu_1, \dots, \mu_m, \mu| y, k_1, k_2, \tau, \alpha, \beta) \propto \tau^{\alpha - 1} exp\{ - \beta \tau \} \times \\ (\tau/k_1)^{m/2} exp\{ - \frac{\tau}{2k_1} \sum_{j = 1}^{m} (\mu_j - \mu)^2 \} \\ \times (\tau/k_2)^{1/2} exp\{ - \frac{ \tau}{2 k_2} \mu^2 \} \times \tau^{n/2} exp \{ -\frac{\tau}{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \} \end{equation}\]

2.3 Posteriors

2.4 Posterior for \(\tau\)

  • \(p(\tau | \mu_1, \dots, \mu_m, y, \alpha, \beta, k_1)\)

\[\begin{equation} p(\tau | \mu_1, \dots, \mu_m, y, \alpha, \beta, k_1) \propto \tau^{\alpha - 1} exp\{ - \beta \tau \} \times \\ (\tau/k_1)^{m/2} exp \{ -\frac{\tau}{2k_1} \sum_{j = 1}^{m} (\mu_{j} - \mu)^2 \} \times \\ \tau^{n/2} exp \{ -\frac{\tau}{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \} \times (\tau/k_2)^{1/2} exp\{ - \frac{ \tau}{2 k_2} \mu^2 \} \\ \propto \tau^{(n+m+1)/2 + \alpha - 1 } exp \{ - \tau \Big( \frac{\sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2}{2} + \beta + \frac{\sum_{j = 1}^{m}(\mu_{j} - \mu)^2}{2k_1} + \frac{\mu^2}{2k_2}\Big) \} \end{equation}\]

So \(\tau | \mu_1, \dots, \mu_m, y, \alpha, \beta, k_1, k_2 \sim \\ \text{Gamma}((n+m+1)/2 + \alpha, \Big( \frac{\sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2}{2} + \beta + \frac{\sum_{j = 1}^{m}(\mu_{j} - \mu)^2}{2k_1} + \frac{\mu^2}{2k_2})\Big)\)

2.5 Posterior for \(\mu_j\)

\[\begin{equation} Q = (\tau/k_1) \sum_{j=1}^{m} (\mu_j - \mu)^2 + \tau \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \\ Q = \tau [ \sum_{j = 1}^{m} n_j \mu_j^2 + \frac{\mu_j^2}{k_1} - (\sum_{j = 1}^{m} \frac{2 \mu \mu_j}{k_1} + 2 \bar y_j n_j \mu_j)] \\ Q \propto \tau [ \sum_{j = 1}^{m} (n_j + \frac{1}{k_1}) \mu_j^2 - 2 \mu_j (\frac{\mu}{k_1} + \bar y_j n_j )] \\ Q \propto [ \sum_{j = 1}^{m} (n_j + \frac{1}{k_1})(\mu_j - \frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1})^2]\\ \end{equation}\] So for each \(\mu_j\)

\[\mu_j | \mu, y, \tau, k_1 \sim N(\frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1}, ((n_j + \frac{1}{k_1}) \tau )^{-1})\]

2.6 Posterior for \(\mu\)

Similarly, for \(\mu\) we have:

\[\begin{equation} Q = \frac{\tau}{k_1} \sum_{j = 1}^{m} (\mu_j - \mu)^2 + \frac{ \tau}{k_2} \mu^2 \\ Q = \frac{\tau}{k_1} \sum_{j=1}^{m} (\mu_j^{2} - 2 \mu \mu_j + \mu^2) + \frac{ \tau}{k_2} \mu^2 \\ Q \propto (\frac{\tau}{k_2} + \frac{\tau}{k_1} m ) \mu^2 - \frac{2\tau}{k_1} \sum_{j=1}^{m} \mu \mu_j \\ Q \propto (\frac{\tau}{k_2} + \frac{\tau}{k_1} m ) \mu^2 - \frac{2\tau}{k_1} \mu \bar \mu m \\ Q \propto (\tau(\frac{m}{k_1} + \frac{1}{k_2}))(\mu - \frac{(\tau/k_1) \bar \mu m}{\tau(\frac{m}{k_1} + \frac{1}{k_2})})^2 \end{equation}\]

So for \(\mu\) we have that the conditional distribution:

\[\mu | \mu_1, \dots, \mu_{m}, \mu_{\mu}, k_1, k_2, \tau \sim N( \frac{(\tau/k_1) \bar \mu m}{\tau(\frac{m}{k_1} + \frac{1}{k_2})}, (\tau(\frac{m}{k_1} + \frac{1}{k_2}))^{-1})\]

2.7 A second posterior, with \(\mu_j\) marginalised out

The following is what is used in the code.

Assuming \[ y | \tau, k_1, k_2 \sim N(0, \tau^{-1}[(k_1 \mathbf{M}\mathbf{M}^{T} + \mathbf{I}) + k_2 \mathbf{1}\mathbf{1}^{T}])\]

we can do

\[\begin{equation} p(\mu | y, \alpha, \beta, k_1, k_2) \propto \exp \{ -\frac{\tau}{2}(\mathbf{y} - \mu \mathbf{1})^{T} \Psi^{-1} (\mathbf{y} - \mu \mathbf{1}) \} \times \\ \exp \{ -\frac{\tau}{2 k_2} \mu^{2} \} \\ \propto \exp \{ -\frac{\tau}{2}(\mu^{2} (\mathbf{1}^{T} \Psi^{-1} \mathbf{1} + 1/k_2) - 2 \mu \mathbf{1}^{T} \Psi^{-1} \mathbf{y} + \mu^2/k_2) \} \\ \propto \exp \{ -\frac{\tau}{2}[( \mathbf{1}^{T} \Psi^{-1} \mathbf{1}+ 1/k_2)( \mu^2 - \frac{2 \mu \mathbf{1}^{T} \Psi^{-1} \mathbf{y}}{\mathbf{1}^{T} \Psi^{-1} \mathbf{1} + 1/k_1}) \} \\ \end{equation}\]

\[\mu | y, \tau, k_1, k_2 \sim MVN(\frac{\mathbf{1}^{T}\Psi^{-1}\mathbf{y}}{\mathbf{1}^{T}\Psi^{-1}\mathbf{1} + k_2^{-1}}, ((\mathbf{1}^{T}\Psi^{-1}\mathbf{1} + k_2^{-1}) \tau )^{-1})\]

2.8 Marginal Distributions for y

Suppose we have the outcome variable: \[\mathbf{y} \sim MVN_n(\mu \mathbf{1}_n, \tau^{-1} (k_1 \mathbf{MM}^T + \mathbf{I}))\]

with: \[\mu \sim N(\mu_\mu, \tau_\mu^{-1})\]

And define \(\mathbf{\Psi} = k_1 \mathbf{MM}^T + \mathbf{I}\)

Then, as a ‘trick’ to estimate the mean and variance of \(\mathbf{y}\), we can write:

\[\mathbf{y} = \mu \mathbf{1}_n + \left[ \tau^{-1} \Psi \right]^{1/2} \mathbf{z}\]

where \(\mathbf{z} \sim MVN(0, \mathbf{I})\) is a standard multivariate normal. Then we have:

\[E(\mathbf{y}) = \mu_\mu \mathbf{1}_n + 0 = \mu_\mu \mathbf{1}_n \\ Var(\mathbf{y}) = Var(\mu \mathbf{1}_n) + Var(\left[\tau^{-1} \Psi \right]^{1/2}) = \tau_{\mu}^{-1} \mathbf{1}_n \mathbf{1}^T_n + \tau^{-1} \mathbf{\Psi}\]

Now let \(\tau_\mu^{-1} = k_2 \tau^{-1}\), we get:

\[\mathbf{y} \sim MVN(\mu_\mu \mathbf{1}, \tau^{-1} \left[ k_2 \mathbf{1}_n \mathbf{1}^T_n + \mathbf{\Psi} \right]) \equiv MVN(W_0, \tau^{-1} W_1)\]

We now want to marginalize this over \(\tau \sim Ga(\alpha, \beta)\), by integrating out a Gamma distribution with:

\[ Ga\Big(n/2 + \alpha, \beta + \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2}\Big) \]

…so we get:

\[\pi(\mathbf{y} | \mu_\mu, k_1, k_2) = \int (2\pi)^{-n/2} \tau^{n/2} | \mathbf{W}_1 |^{-1/2} \exp \left[ -\frac{\tau}{2} (\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)\right] \tau^{\alpha - 1} \exp(-\beta \tau) \partial \tau\]

This becomes:

\[ = (2\pi)^{-n/2} | \mathbf{W}_1 |^{-1/2} \Gamma \left( \frac{n}{2} + \alpha \right) \left[ \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2} + \beta\right]^{-\left(\frac{n}{2} + \alpha \right)}\]

(For examples of the evaluation of this marginal distribution, see )

2.8.1 log version of the marginal:

This equation in log-scale gives us:

\[\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k_1, k_2, \mu, \alpha, \beta)) &=& -\frac{N}{2} \log(2\pi) -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1}|) + \log(\Gamma(N/2 + \alpha)) \\ &-& (N/2 + \alpha)\left[ \log \Big( \beta + \frac{(\mathbf{y} - \mathbf{W}_{0})^T \mathbf{W}_{1}^{-1} (\mathbf{y} - \mathbf{W}_{0})}{2}\Big) \right] \end{eqnarray*}\]

And the same, when written for \(j = 1,\dots, b\) nodes of a tree, would look like:

\[\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | N_j, k_1, k_2, \mu, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{N_j}{2} \log(2\pi) + -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1,j}|) + \log(\Gamma(N_j/2 + \alpha)) \\ &-& (N_j/2 + \alpha)\left[ \log \Big(\beta + \frac{(\mathbf{y}_j - \mathbf{W}_{0,j})^T \mathbf{W}_{1,j}^{-1} (\mathbf{y}_j - \mathbf{W}_{0,j})}{2} \Big) \right] \end{eqnarray*}\]

2.8.2 (Old) Marginal Distributions

This is only present here for the record.

\[y_{ij} \sim N(\mu_j, \tau^{-1})\] \[\mu_j \sim N(\mu, k\tau^{-1})\] \[\mu \sim N(\mu_\mu,\tau_\mu^{-1})\] \[\tau \sim Ga(\alpha, \beta)\]

with \(N = \sum_{j = 1}^{m} n_j\). Define \(\mathbf{M}\) to be an \(N\times m\) binary matrix which allocates each observation to a group.

Writing things out in matrix format:

\[\begin{equation} \boldsymbol{y} = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_m m} \end{bmatrix} \end{equation}\]

\[\begin{equation} \boldsymbol{M} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ \vdots & \vdots & \vdots \\ 0 & 0 & 1 \end{bmatrix} \end{equation}\]

\[\begin{equation} \boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m \end{bmatrix} \end{equation}\]

then \(\boldsymbol{y} \sim MVN_{N}( \boldsymbol{M \lambda}, \tau^{-1} \boldsymbol{I})\) and \(\boldsymbol{\lambda} \sim MVN_{m}( \mu \boldsymbol{1}, k \tau^{-1} \boldsymbol{I})\).

\[\begin{equation} E[\boldsymbol{y} | \mu, \tau] = E_\lambda E_y[\boldsymbol{y} | \boldsymbol{\lambda}, \mu, \tau] = \boldsymbol{M} E[\boldsymbol{\lambda}] = \mu \boldsymbol{M1} \\ Var[\boldsymbol{y} | \mu, \tau] = Var[\boldsymbol{M} \boldsymbol{\lambda}] + \tau^{-1} \boldsymbol{I} = \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I} \end{equation}\]

so \[\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I})\]

\[\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , k\tau^{-1} + \tau^{-1} \boldsymbol{I}), \text{ since } \boldsymbol{M} \boldsymbol{M}^{T} = \boldsymbol{I}\]

\[\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \tau^{-1} (k + \boldsymbol{I}) )\]

We now use this as the starting point and integrate out \(\mu\) and \(\tau\). The equation we end up with should be a function of \(M\), \(k\), \(\tau_\mu\), \(\alpha\), and \(\beta\).

To start, define: \(\Psi = (k + \boldsymbol{I})\) so that \(y|\ldots \sim MVN(\mu \boldsymbol{M1}, \tau^{-1} \boldsymbol{\Psi})\). Then we obtain:

\[\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \exp \left[ -\frac{\tau_\mu}{2} (\mu - \mu_\mu)^2 \right]\\ &\times& \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{\tau}{2} \left\{ (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1}) \right\} \right] \partial\mu \partial\tau \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ \end{eqnarray*}\]

The inner expression can be rewritten as:

\[\begin{eqnarray*} Q &=& [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \\ &=& \mu^2(\tau_{\mu} + \tau (\boldsymbol{M1})^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{M1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \mu^2(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}}\right)^2 \\ &+& \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}\\ \end{eqnarray*}\]

that can be be plugged back into our equation as a \(\text{Normal}(\frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}, (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{-1})\):

\[\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})} \right \} \right]\\ &\times& \int \exp \left[ -\frac{1}{2} [ (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}\right)^2 \right] \\ &\times& \frac{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi} ^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ \end{eqnarray*}\]

Now, replacing \(\tau_{\mu} = k \tau\), we have:

\[\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta, \tau) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times (k \tau)^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ k \tau\mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(k \tau\mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{(k \tau + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(k \tau+ \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \int \tau^{\alpha - 1} \tau^{1/2} \tau^{N/2} \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau (k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}))^2}{ (\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}\boldsymbol{1})} \right \} \right] \frac{1}{(\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}))^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{\alpha - 1} \tau^{N/2} \\ &\times& \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} \} \right] \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \partial\tau \\ \end{eqnarray*}\]

where the main expression can be seen as a \(\text{Gamma}(N/2 + \alpha, \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ))\)

and:

\[\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta) &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \\ &\times& \frac{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} }{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} } \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{k + (\boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \\ &\times& (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{-(N/2 + \alpha)}\\ \end{eqnarray*}\]

This equation in log-scale gives us:

\[\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta)) &=& -\frac{1}{2} \log(|\boldsymbol{\Psi}|) + \frac{\log(k)}{2} - \log(k + ((\boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N/2 + \alpha)\left[ \log \beta + \log(1/2) + \log(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}) \right] \end{eqnarray*}\]

And the same, when written for \(j = 1,\dots, b\) nodes of a tree, would look like:

\[\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | k_{j}, \mu_{\mu_{j}}, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{1}{2} \log(|\boldsymbol{\Psi_{j}}|) + \frac{\log(k_{j})}{2} - \log(k_{j} + ((\boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N_{j}/2 + \alpha) [ \log \beta + \log(1/2) + \log(k_{j} \mu_{\mu_{j}}^2 + \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1} \boldsymbol{y_{j}} \\ &+& \frac{(k_{j} \mu_{\mu_{j}} \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1}\boldsymbol{M_{j} 1})^2}{k_{j} + \boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1}})] \end{eqnarray*}\]

2.9 Algorithm


Algorithm type: Metropolis within GIBBS for a hierachical Bayesian (single) tree

Reason: We have closed posteriors for most parameters, but not for the tree structure

Data: Target variable \(y\), groups \(j = 1,\dots,m\), set of features X

Result: Posterior distributions for all parameters


Initialisation;

Hyper-parameters values for \(\alpha, \beta, k_1, k_2\);

Number of groups \(m\);

Number of observations \(N =\sum_{j = 1}^{m} n_j\);

Number of iterations I;

  • define \(\mu_{\mu} = 0\), \(\mu^{0}\), \(\tau^{0}\), and \(\mu_j^{0}, j = 1,\dots, m\) as the initial parameter values

  • for i from 1 to I do:

    • grow a new \(T^{\text{new}}\) tree by either growing, pruning, changing or swapping a root node

    • set \(l_{\text{new}}\) = log full conditional for the new (candidate) tree

\(l_{\text{new}} = \sum_{l = 1}^{b_{\text{new}}} -\frac{1}{2} \log(|\boldsymbol{W}_{1,l}|) + \log(\Gamma(N_l/2 + \alpha))\) \(-(N_l/2 + \alpha)\left[ \log \beta + \log\Big(\frac{(\mathbf{y}_l - \mathbf{W}_{0,l})^T \mathbf{W}_{1,l}^{-1} (\mathbf{y}_l - \mathbf{W}_{0,l})}{2} \Big) \right]\)
  • set \(l_{\text{old}}\) = log full conditional for the previous tree
\(l_{\text{old}} = \sum_{l = 1}^{b_{\text{old}}} -\frac{1}{2} \log(|\boldsymbol{W}_{1,l}|) + \log(\Gamma(N_l/2 + \alpha))\) \(-(N_l/2 + \alpha)\left[ \log \beta + \log\Big(\frac{(\mathbf{y}_l - \mathbf{W}_{0,l})^T \mathbf{W}_{1,l}^{-1} (\mathbf{y}_l - \mathbf{W}_{0,l})}{2} \Big) \right]\)
  • set \(a = \exp(l_{\text{new}} - l_{\text{old}})\)
  • generate \(u \sim U[0, 1]\)
  • if \(u < a\) then:
    • set \(T = T^{\text{new}}\)
  • end

  • sample \(\mu\) from the posterior \(N(\frac{(\tau/k_1) \bar \mu m}{\tau(\frac{1}{k_2} + \frac{m}{k_1})}, (\tau(\frac{1}{k_2} + \frac{m}{k_1}) )^{-1})\) (because of \(\bar \mu\))

  • for j in 1:m do:
    • sample \(\mu_j\) from the posterior \(N(\frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1}, ((n_j + \frac{1}{k_1})\tau)^{-1})\)
  • end

  • sample \(\tau\) from the posterior \(\text{Gamma}\Big(n/2 + \alpha, \beta + \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2}\Big)\)

  • end