# Chapter 2 Single tree model

## 2.1 Model specification

Define the following model. Suppose we have the observation of a tree node as: $y_{ij}, i = 1,\ldots,n_j, \; j = 1\ldots, m$ where $$y_{ij}$$ is observation $$i$$ in group $$j$$. There are different numbers of observations $$n_j$$ in each group, so for example $$n_1$$ is the number of observations in group 1, etc. There are $$m$$ groups. The total number of observations is $$n = \sum_{j=1}^m n_j$$

Then, for each tree node, suppose we have the likelihood: $y_{ij} \sim N(\mu_j, \tau^{-1})$

so each group has an overall mean $$\mu_j$$, with an overall precision term $$\tau$$.

We then have a hierarchical prior distribution:

$\mu_j \sim N(\mu, k_1 (\tau^{-1}))$

where $$k_1$$ will be taken as a constant for simplicity, and the hyper-parameter prior distributions are:

$\mu \sim N(0, \tau_{\mu} = k_2 (\tau^{-1}))$ $\tau \sim Ga(\alpha, \beta)$

Where the values $$k_1, k_2, \alpha, \beta$$ are all fixed.

## 2.2 Maths

• The likelihood of each tree node will be:

$$$L = \prod_{j = 1}^{m} \prod_{i = 1}^{n_j} p(y_{ij} | \mu_{j}, \tau) \\ L \propto \tau^{n/2} exp \{ -\frac{\tau}{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \}$$$

with prior distributions:

• $$\mu_j | \mu, \tau, k_1$$

$$$p(\mu_1, \dots, \mu_m | \mu, \tau) \propto (\tau/k_1)^{m/2} exp\{ - \frac{\tau}{2k_1} \sum_{j = 1}^{m} (\mu_j - \mu)^2 \}$$$

• $$\tau | \alpha, \beta$$

$p(\tau | \alpha, \beta) \propto \tau^{\alpha - 1} exp\{ - \beta \tau \}$

• $$\mu | \tau_{\mu} = k_2 (\tau^{-1})$$

$$$p(\mu | k_2, \tau) \propto (\tau/k_2)^{1/2} exp\{ - \frac{ \tau}{2 k_2} \mu^2 \} \}$$$

and their joint distribution as:

• $$p(\tau, \mu_1, \dots, \mu_m, \mu| y, k_1, k_2, \tau, \alpha, \beta)$$

$$$p(\tau, \mu_1, \dots, \mu_m, \mu| y, k_1, k_2, \tau, \alpha, \beta) \propto \tau^{\alpha - 1} exp\{ - \beta \tau \} \times \\ (\tau/k_1)^{m/2} exp\{ - \frac{\tau}{2k_1} \sum_{j = 1}^{m} (\mu_j - \mu)^2 \} \\ \times (\tau/k_2)^{1/2} exp\{ - \frac{ \tau}{2 k_2} \mu^2 \} \times \tau^{n/2} exp \{ -\frac{\tau}{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \}$$$

## 2.4 Posterior for $$\tau$$

• $$p(\tau | \mu_1, \dots, \mu_m, y, \alpha, \beta, k_1)$$

$$$p(\tau | \mu_1, \dots, \mu_m, y, \alpha, \beta, k_1) \propto \tau^{\alpha - 1} exp\{ - \beta \tau \} \times \\ (\tau/k_1)^{m/2} exp \{ -\frac{\tau}{2k_1} \sum_{j = 1}^{m} (\mu_{j} - \mu)^2 \} \times \\ \tau^{n/2} exp \{ -\frac{\tau}{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \} \times (\tau/k_2)^{1/2} exp\{ - \frac{ \tau}{2 k_2} \mu^2 \} \\ \propto \tau^{(n+m+1)/2 + \alpha - 1 } exp \{ - \tau \Big( \frac{\sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2}{2} + \beta + \frac{\sum_{j = 1}^{m}(\mu_{j} - \mu)^2}{2k_1} + \frac{\mu^2}{2k_2}\Big) \}$$$

So $$\tau | \mu_1, \dots, \mu_m, y, \alpha, \beta, k_1, k_2 \sim \\ \text{Gamma}((n+m+1)/2 + \alpha, \Big( \frac{\sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2}{2} + \beta + \frac{\sum_{j = 1}^{m}(\mu_{j} - \mu)^2}{2k_1} + \frac{\mu^2}{2k_2})\Big)$$

## 2.5 Posterior for $$\mu_j$$

$$$Q = (\tau/k_1) \sum_{j=1}^{m} (\mu_j - \mu)^2 + \tau \sum_{j = 1}^{m} \sum_{i = 1}^{n_j} (y_{ij} - \mu_j)^2 \\ Q = \tau [ \sum_{j = 1}^{m} n_j \mu_j^2 + \frac{\mu_j^2}{k_1} - (\sum_{j = 1}^{m} \frac{2 \mu \mu_j}{k_1} + 2 \bar y_j n_j \mu_j)] \\ Q \propto \tau [ \sum_{j = 1}^{m} (n_j + \frac{1}{k_1}) \mu_j^2 - 2 \mu_j (\frac{\mu}{k_1} + \bar y_j n_j )] \\ Q \propto [ \sum_{j = 1}^{m} (n_j + \frac{1}{k_1})(\mu_j - \frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1})^2]\\$$$ So for each $$\mu_j$$

$\mu_j | \mu, y, \tau, k_1 \sim N(\frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1}, ((n_j + \frac{1}{k_1}) \tau )^{-1})$

## 2.6 Posterior for $$\mu$$

Similarly, for $$\mu$$ we have:

$$$Q = \frac{\tau}{k_1} \sum_{j = 1}^{m} (\mu_j - \mu)^2 + \frac{ \tau}{k_2} \mu^2 \\ Q = \frac{\tau}{k_1} \sum_{j=1}^{m} (\mu_j^{2} - 2 \mu \mu_j + \mu^2) + \frac{ \tau}{k_2} \mu^2 \\ Q \propto (\frac{\tau}{k_2} + \frac{\tau}{k_1} m ) \mu^2 - \frac{2\tau}{k_1} \sum_{j=1}^{m} \mu \mu_j \\ Q \propto (\frac{\tau}{k_2} + \frac{\tau}{k_1} m ) \mu^2 - \frac{2\tau}{k_1} \mu \bar \mu m \\ Q \propto (\tau(\frac{m}{k_1} + \frac{1}{k_2}))(\mu - \frac{(\tau/k_1) \bar \mu m}{\tau(\frac{m}{k_1} + \frac{1}{k_2})})^2$$$

So for $$\mu$$ we have that the conditional distribution:

$\mu | \mu_1, \dots, \mu_{m}, \mu_{\mu}, k_1, k_2, \tau \sim N( \frac{(\tau/k_1) \bar \mu m}{\tau(\frac{m}{k_1} + \frac{1}{k_2})}, (\tau(\frac{m}{k_1} + \frac{1}{k_2}))^{-1})$

## 2.7 A second posterior, with $$\mu_j$$ marginalised out

The following is what is used in the code.

Assuming $y | \tau, k_1, k_2 \sim N(0, \tau^{-1}[(k_1 \mathbf{M}\mathbf{M}^{T} + \mathbf{I}) + k_2 \mathbf{1}\mathbf{1}^{T}])$

we can do

$$$p(\mu | y, \alpha, \beta, k_1, k_2) \propto \exp \{ -\frac{\tau}{2}(\mathbf{y} - \mu \mathbf{1})^{T} \Psi^{-1} (\mathbf{y} - \mu \mathbf{1}) \} \times \\ \exp \{ -\frac{\tau}{2 k_2} \mu^{2} \} \\ \propto \exp \{ -\frac{\tau}{2}(\mu^{2} (\mathbf{1}^{T} \Psi^{-1} \mathbf{1} + 1/k_2) - 2 \mu \mathbf{1}^{T} \Psi^{-1} \mathbf{y} + \mu^2/k_2) \} \\ \propto \exp \{ -\frac{\tau}{2}[( \mathbf{1}^{T} \Psi^{-1} \mathbf{1}+ 1/k_2)( \mu^2 - \frac{2 \mu \mathbf{1}^{T} \Psi^{-1} \mathbf{y}}{\mathbf{1}^{T} \Psi^{-1} \mathbf{1} + 1/k_1}) \} \\$$$

$\mu | y, \tau, k_1, k_2 \sim MVN(\frac{\mathbf{1}^{T}\Psi^{-1}\mathbf{y}}{\mathbf{1}^{T}\Psi^{-1}\mathbf{1} + k_2^{-1}}, ((\mathbf{1}^{T}\Psi^{-1}\mathbf{1} + k_2^{-1}) \tau )^{-1})$

## 2.8 Marginal Distributions for y

Suppose we have the outcome variable: $\mathbf{y} \sim MVN_n(\mu \mathbf{1}_n, \tau^{-1} (k_1 \mathbf{MM}^T + \mathbf{I}))$

with: $\mu \sim N(\mu_\mu, \tau_\mu^{-1})$

And define $$\mathbf{\Psi} = k_1 \mathbf{MM}^T + \mathbf{I}$$

Then, as a ‘trick’ to estimate the mean and variance of $$\mathbf{y}$$, we can write:

$\mathbf{y} = \mu \mathbf{1}_n + \left[ \tau^{-1} \Psi \right]^{1/2} \mathbf{z}$

where $$\mathbf{z} \sim MVN(0, \mathbf{I})$$ is a standard multivariate normal. Then we have:

$E(\mathbf{y}) = \mu_\mu \mathbf{1}_n + 0 = \mu_\mu \mathbf{1}_n \\ Var(\mathbf{y}) = Var(\mu \mathbf{1}_n) + Var(\left[\tau^{-1} \Psi \right]^{1/2}) = \tau_{\mu}^{-1} \mathbf{1}_n \mathbf{1}^T_n + \tau^{-1} \mathbf{\Psi}$

Now let $$\tau_\mu^{-1} = k_2 \tau^{-1}$$, we get:

$\mathbf{y} \sim MVN(\mu_\mu \mathbf{1}, \tau^{-1} \left[ k_2 \mathbf{1}_n \mathbf{1}^T_n + \mathbf{\Psi} \right]) \equiv MVN(W_0, \tau^{-1} W_1)$

We now want to marginalize this over $$\tau \sim Ga(\alpha, \beta)$$, by integrating out a Gamma distribution with:

$Ga\Big(n/2 + \alpha, \beta + \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2}\Big)$

…so we get:

$\pi(\mathbf{y} | \mu_\mu, k_1, k_2) = \int (2\pi)^{-n/2} \tau^{n/2} | \mathbf{W}_1 |^{-1/2} \exp \left[ -\frac{\tau}{2} (\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)\right] \tau^{\alpha - 1} \exp(-\beta \tau) \partial \tau$

This becomes:

$= (2\pi)^{-n/2} | \mathbf{W}_1 |^{-1/2} \Gamma \left( \frac{n}{2} + \alpha \right) \left[ \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2} + \beta\right]^{-\left(\frac{n}{2} + \alpha \right)}$

(For examples of the evaluation of this marginal distribution, see )

### 2.8.1 log version of the marginal:

This equation in log-scale gives us:

$\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k_1, k_2, \mu, \alpha, \beta)) &=& -\frac{N}{2} \log(2\pi) -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1}|) + \log(\Gamma(N/2 + \alpha)) \\ &-& (N/2 + \alpha)\left[ \log \Big( \beta + \frac{(\mathbf{y} - \mathbf{W}_{0})^T \mathbf{W}_{1}^{-1} (\mathbf{y} - \mathbf{W}_{0})}{2}\Big) \right] \end{eqnarray*}$

And the same, when written for $$j = 1,\dots, b$$ nodes of a tree, would look like:

$\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | N_j, k_1, k_2, \mu, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{N_j}{2} \log(2\pi) + -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1,j}|) + \log(\Gamma(N_j/2 + \alpha)) \\ &-& (N_j/2 + \alpha)\left[ \log \Big(\beta + \frac{(\mathbf{y}_j - \mathbf{W}_{0,j})^T \mathbf{W}_{1,j}^{-1} (\mathbf{y}_j - \mathbf{W}_{0,j})}{2} \Big) \right] \end{eqnarray*}$

### 2.8.2 (Old) Marginal Distributions

This is only present here for the record.

$y_{ij} \sim N(\mu_j, \tau^{-1})$ $\mu_j \sim N(\mu, k\tau^{-1})$ $\mu \sim N(\mu_\mu,\tau_\mu^{-1})$ $\tau \sim Ga(\alpha, \beta)$

with $$N = \sum_{j = 1}^{m} n_j$$. Define $$\mathbf{M}$$ to be an $$N\times m$$ binary matrix which allocates each observation to a group.

Writing things out in matrix format:

$$$\boldsymbol{y} = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_m m} \end{bmatrix}$$$

$$$\boldsymbol{M} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ \vdots & \vdots & \vdots \\ 0 & 0 & 1 \end{bmatrix}$$$

$$$\boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m \end{bmatrix}$$$

then $$\boldsymbol{y} \sim MVN_{N}( \boldsymbol{M \lambda}, \tau^{-1} \boldsymbol{I})$$ and $$\boldsymbol{\lambda} \sim MVN_{m}( \mu \boldsymbol{1}, k \tau^{-1} \boldsymbol{I})$$.

$$$E[\boldsymbol{y} | \mu, \tau] = E_\lambda E_y[\boldsymbol{y} | \boldsymbol{\lambda}, \mu, \tau] = \boldsymbol{M} E[\boldsymbol{\lambda}] = \mu \boldsymbol{M1} \\ Var[\boldsymbol{y} | \mu, \tau] = Var[\boldsymbol{M} \boldsymbol{\lambda}] + \tau^{-1} \boldsymbol{I} = \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I}$$$

so $\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I})$

$\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , k\tau^{-1} + \tau^{-1} \boldsymbol{I}), \text{ since } \boldsymbol{M} \boldsymbol{M}^{T} = \boldsymbol{I}$

$\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \tau^{-1} (k + \boldsymbol{I}) )$

We now use this as the starting point and integrate out $$\mu$$ and $$\tau$$. The equation we end up with should be a function of $$M$$, $$k$$, $$\tau_\mu$$, $$\alpha$$, and $$\beta$$.

To start, define: $$\Psi = (k + \boldsymbol{I})$$ so that $$y|\ldots \sim MVN(\mu \boldsymbol{M1}, \tau^{-1} \boldsymbol{\Psi})$$. Then we obtain:

$\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \exp \left[ -\frac{\tau_\mu}{2} (\mu - \mu_\mu)^2 \right]\\ &\times& \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{\tau}{2} \left\{ (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1}) \right\} \right] \partial\mu \partial\tau \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ \end{eqnarray*}$

The inner expression can be rewritten as:

$\begin{eqnarray*} Q &=& [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \\ &=& \mu^2(\tau_{\mu} + \tau (\boldsymbol{M1})^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{M1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \mu^2(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}}\right)^2 \\ &+& \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}\\ \end{eqnarray*}$

that can be be plugged back into our equation as a $$\text{Normal}(\frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}, (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{-1})$$:

$\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})} \right \} \right]\\ &\times& \int \exp \left[ -\frac{1}{2} [ (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}\right)^2 \right] \\ &\times& \frac{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi} ^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ \end{eqnarray*}$

Now, replacing $$\tau_{\mu} = k \tau$$, we have:

$\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta, \tau) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times (k \tau)^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ k \tau\mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(k \tau\mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{(k \tau + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(k \tau+ \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \int \tau^{\alpha - 1} \tau^{1/2} \tau^{N/2} \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau (k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}))^2}{ (\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}\boldsymbol{1})} \right \} \right] \frac{1}{(\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}))^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{\alpha - 1} \tau^{N/2} \\ &\times& \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} \} \right] \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \partial\tau \\ \end{eqnarray*}$

where the main expression can be seen as a $$\text{Gamma}(N/2 + \alpha, \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ))$$

and:

$\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta) &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \\ &\times& \frac{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} }{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} } \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{k + (\boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \\ &\times& (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{-(N/2 + \alpha)}\\ \end{eqnarray*}$

This equation in log-scale gives us:

$\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta)) &=& -\frac{1}{2} \log(|\boldsymbol{\Psi}|) + \frac{\log(k)}{2} - \log(k + ((\boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N/2 + \alpha)\left[ \log \beta + \log(1/2) + \log(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}) \right] \end{eqnarray*}$

And the same, when written for $$j = 1,\dots, b$$ nodes of a tree, would look like:

$\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | k_{j}, \mu_{\mu_{j}}, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{1}{2} \log(|\boldsymbol{\Psi_{j}}|) + \frac{\log(k_{j})}{2} - \log(k_{j} + ((\boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N_{j}/2 + \alpha) [ \log \beta + \log(1/2) + \log(k_{j} \mu_{\mu_{j}}^2 + \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1} \boldsymbol{y_{j}} \\ &+& \frac{(k_{j} \mu_{\mu_{j}} \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1}\boldsymbol{M_{j} 1})^2}{k_{j} + \boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1}})] \end{eqnarray*}$

## 2.9 Algorithm

Algorithm type: Metropolis within GIBBS for a hierachical Bayesian (single) tree

Reason: We have closed posteriors for most parameters, but not for the tree structure

Data: Target variable $$y$$, groups $$j = 1,\dots,m$$, set of features X

Result: Posterior distributions for all parameters

Initialisation;

Hyper-parameters values for $$\alpha, \beta, k_1, k_2$$;

Number of groups $$m$$;

Number of observations $$N =\sum_{j = 1}^{m} n_j$$;

Number of iterations I;

• define $$\mu_{\mu} = 0$$, $$\mu^{0}$$, $$\tau^{0}$$, and $$\mu_j^{0}, j = 1,\dots, m$$ as the initial parameter values

• for i from 1 to I do:

• grow a new $$T^{\text{new}}$$ tree by either growing, pruning, changing or swapping a root node

• set $$l_{\text{new}}$$ = log full conditional for the new (candidate) tree

$$l_{\text{new}} = \sum_{l = 1}^{b_{\text{new}}} -\frac{1}{2} \log(|\boldsymbol{W}_{1,l}|) + \log(\Gamma(N_l/2 + \alpha))$$ $$-(N_l/2 + \alpha)\left[ \log \beta + \log\Big(\frac{(\mathbf{y}_l - \mathbf{W}_{0,l})^T \mathbf{W}_{1,l}^{-1} (\mathbf{y}_l - \mathbf{W}_{0,l})}{2} \Big) \right]$$
• set $$l_{\text{old}}$$ = log full conditional for the previous tree
$$l_{\text{old}} = \sum_{l = 1}^{b_{\text{old}}} -\frac{1}{2} \log(|\boldsymbol{W}_{1,l}|) + \log(\Gamma(N_l/2 + \alpha))$$ $$-(N_l/2 + \alpha)\left[ \log \beta + \log\Big(\frac{(\mathbf{y}_l - \mathbf{W}_{0,l})^T \mathbf{W}_{1,l}^{-1} (\mathbf{y}_l - \mathbf{W}_{0,l})}{2} \Big) \right]$$
• set $$a = \exp(l_{\text{new}} - l_{\text{old}})$$
• generate $$u \sim U[0, 1]$$
• if $$u < a$$ then:
• set $$T = T^{\text{new}}$$
• end

• sample $$\mu$$ from the posterior $$N(\frac{(\tau/k_1) \bar \mu m}{\tau(\frac{1}{k_2} + \frac{m}{k_1})}, (\tau(\frac{1}{k_2} + \frac{m}{k_1}) )^{-1})$$ (because of $$\bar \mu$$)

• for j in 1:m do:
• sample $$\mu_j$$ from the posterior $$N(\frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1}, ((n_j + \frac{1}{k_1})\tau)^{-1})$$
• end

• sample $$\tau$$ from the posterior $$\text{Gamma}\Big(n/2 + \alpha, \beta + \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2}\Big)$$

• end