Chapter 2 Single tree model

2.1 Model specification

Define the following model. Suppose we have the observation of a tree node as: yij,i=1,,nj,j=1,m where yij is observation i in group j. There are different numbers of observations nj in each group, so for example n1 is the number of observations in group 1, etc. There are m groups. The total number of observations is n=mj=1nj

Then, for each tree node, suppose we have the likelihood: yijN(μj,τ1)

so each group has an overall mean μj, with an overall precision term τ.

We then have a hierarchical prior distribution:

μjN(μ,k1(τ1))

where k1 will be taken as a constant for simplicity, and the hyper-parameter prior distributions are:

μN(0,τμ=k2(τ1)) τGa(α,β)

Where the values k1,k2,α,β are all fixed.

2.2 Maths

  • The likelihood of each tree node will be:

L=mj=1nji=1p(yij|μj,τ)Lτn/2exp{τ2mj=1nji=1(yijμj)2}

with prior distributions:

  • μj|μ,τ,k1

p(μ1,,μm|μ,τ)(τ/k1)m/2exp{τ2k1mj=1(μjμ)2}

  • τ|α,β

p(τ|α,β)τα1exp{βτ}

  • μ|τμ=k2(τ1)

p(μ|k2,τ)(τ/k2)1/2exp{τ2k2μ2}}

and their joint distribution as:

  • p(τ,μ1,,μm,μ|y,k1,k2,τ,α,β)

p(τ,μ1,,μm,μ|y,k1,k2,τ,α,β)τα1exp{βτ}×(τ/k1)m/2exp{τ2k1mj=1(μjμ)2}×(τ/k2)1/2exp{τ2k2μ2}×τn/2exp{τ2mj=1nji=1(yijμj)2}

2.3 Posteriors

2.4 Posterior for τ

  • p(τ|μ1,,μm,y,α,β,k1)

p(τ|μ1,,μm,y,α,β,k1)τα1exp{βτ}×(τ/k1)m/2exp{τ2k1mj=1(μjμ)2}×τn/2exp{τ2mj=1nji=1(yijμj)2}×(τ/k2)1/2exp{τ2k2μ2}τ(n+m+1)/2+α1exp{τ(mj=1nji=1(yijμj)22+β+mj=1(μjμ)22k1+μ22k2)}

So τ|μ1,,μm,y,α,β,k1,k2Gamma((n+m+1)/2+α,(mj=1nji=1(yijμj)22+β+mj=1(μjμ)22k1+μ22k2))

2.5 Posterior for μj,p

Q=(τ/(k1/P))mj=1(μjμ)2+τmj=1nji=1(yijμj)2Q=τ[mj=1njμ2j+μ2j(k1/P)(mj=12μμj(k1/P)+2ˉyjnjμj)]Qτ[mj=1(nj+1(k1/P))μ2j2μj(μ(k1/P)+ˉyjnj)]Qτ[mj=1(nj+1(k1/P))(μjμ/(k1/P)+ˉyjnjnj+1/(k1/P))2] So for each μj

μj|μ,y,τ,k1N(μ/(k1/P)+ˉyjnjnj+1/(k1/P),((nj+1(k1/P))τ)1)

2.6 Posterior for μ

Similarly, for μ we have:

Q=[τ(k1/P)mj=1(μjμ)2+τk2μ2]Q=[τ(k1/P)mj=1(μ2j2μμj+μ2)+τk2μ2]Q[(τk2+τm(k1/P))μ22τ(k1/P)mj=1μμj]Q[(τk2+τm(k1/P))μ22τ(k1/P)μˉμm]Q(τ(m(k1/P)+1k2))(μ(τm/(k1/P))ˉμτ(m(k1/P)+1k2))2

So for μ we have that the conditional distribution:

μ|μ1,,μm,μμ,k1,k2,τN((τm/(k1/P))ˉμτ(m(k1/P)+1k2),τ(m(k1/P)+1k2)1)

2.7 A second posterior, with μj marginalised out

The following is what is used in the code.

Assuming y|τ,k1,k2N(0,τ1[(k1MMT+I)+k211T])

we can do

p(μ|y,α,β,k1,k2)exp{τ2(yμ1)TΨ1(yμ1)}×exp{τ2k2μ2}exp{τ2(μ2(1TΨ11+1/k2)2μ1TΨ1y+μ2/k2)}exp{τ2[(1TΨ11+1/k2)(μ22μ1TΨ1y1TΨ11+1/k1)}

μ|y,τ,k1,k2MVN(1TΨ1y1TΨ11+k12,((1TΨ11+k12)τ)1)

2.8 Marginal Distributions for y

Suppose we have the outcome variable: yMVNn(μ1n,τ1(k1MMT+I))

with: μN(μμ,τ1μ)

And define Ψ=k1MMT+I

Then, as a ‘trick’ to estimate the mean and variance of y, we can write:

y=μ1n+[τ1Ψ]1/2z

where zMVN(0,I) is a standard multivariate normal. Then we have:

E(y)=μμ1n+0=μμ1nVar(y)=Var(μ1n)+Var([τ1Ψ]1/2)=τ1μ1n1Tn+τ1Ψ

Now let τ1μ=k2τ1, we get:

yMVN(μμ1,τ1[k21n1Tn+Ψ])MVN(W0,τ1W1)

We now want to marginalize this over τGa(α,β), by integrating out a Gamma distribution with:

Ga(n/2+α,β+(yW0)TW11(yW0)2)

…so we get:

π(y|μμ,k1,k2)=(2π)n/2τn/2|W1|1/2exp[τ2(yW0)TW11(yW0)]τα1exp(βτ)τ

This becomes:

=(2π)n/2|W1|1/2Γ(n2+α)[(yW0)TW11(yW0)2+β](n2+α)

(For examples of the evaluation of this marginal distribution, see )

2.8.1 log version of the marginal:

This equation in log-scale gives us:

\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k_1, k_2, \mu, \alpha, \beta)) &=& -\frac{N}{2} \log(2\pi) -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1}|) + \log(\Gamma(N/2 + \alpha)) \\ &-& (N/2 + \alpha)\left[ \log \Big( \beta + \frac{(\mathbf{y} - \mathbf{W}_{0})^T \mathbf{W}_{1}^{-1} (\mathbf{y} - \mathbf{W}_{0})}{2}\Big) \right] \end{eqnarray*}

And the same, when written for j = 1,\dots, b nodes of a tree, would look like:

\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | N_j, k_1, k_2, \mu, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{N_j}{2} \log(2\pi) + -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1,j}|) + \log(\Gamma(N_j/2 + \alpha)) \\ &-& (N_j/2 + \alpha)\left[ \log \Big(\beta + \frac{(\mathbf{y}_j - \mathbf{W}_{0,j})^T \mathbf{W}_{1,j}^{-1} (\mathbf{y}_j - \mathbf{W}_{0,j})}{2} \Big) \right] \end{eqnarray*}

2.8.2 (Old) Marginal Distributions

This is only present here for the record.

y_{ij} \sim N(\mu_j, \tau^{-1}) \mu_j \sim N(\mu, k\tau^{-1}) \mu \sim N(\mu_\mu,\tau_\mu^{-1}) \tau \sim Ga(\alpha, \beta)

with N = \sum_{j = 1}^{m} n_j. Define \mathbf{M} to be an N\times m binary matrix which allocates each observation to a group.

Writing things out in matrix format:

\begin{equation} \boldsymbol{y} = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_m m} \end{bmatrix} \end{equation}

\begin{equation} \boldsymbol{M} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ \vdots & \vdots & \vdots \\ 0 & 0 & 1 \end{bmatrix} \end{equation}

\begin{equation} \boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m \end{bmatrix} \end{equation}

then \boldsymbol{y} \sim MVN_{N}( \boldsymbol{M \lambda}, \tau^{-1} \boldsymbol{I}) and \boldsymbol{\lambda} \sim MVN_{m}( \mu \boldsymbol{1}, k \tau^{-1} \boldsymbol{I}).

\begin{equation} E[\boldsymbol{y} | \mu, \tau] = E_\lambda E_y[\boldsymbol{y} | \boldsymbol{\lambda}, \mu, \tau] = \boldsymbol{M} E[\boldsymbol{\lambda}] = \mu \boldsymbol{M1} \\ Var[\boldsymbol{y} | \mu, \tau] = Var[\boldsymbol{M} \boldsymbol{\lambda}] + \tau^{-1} \boldsymbol{I} = \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I} \end{equation}

so \boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I})

\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , k\tau^{-1} + \tau^{-1} \boldsymbol{I}), \text{ since } \boldsymbol{M} \boldsymbol{M}^{T} = \boldsymbol{I}

\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \tau^{-1} (k + \boldsymbol{I}) )

We now use this as the starting point and integrate out \mu and \tau. The equation we end up with should be a function of M, k, \tau_\mu, \alpha, and \beta.

To start, define: \Psi = (k + \boldsymbol{I}) so that y|\ldots \sim MVN(\mu \boldsymbol{M1}, \tau^{-1} \boldsymbol{\Psi}). Then we obtain:

\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \exp \left[ -\frac{\tau_\mu}{2} (\mu - \mu_\mu)^2 \right]\\ &\times& \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{\tau}{2} \left\{ (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1}) \right\} \right] \partial\mu \partial\tau \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ \end{eqnarray*}

The inner expression can be rewritten as:

\begin{eqnarray*} Q &=& [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \\ &=& \mu^2(\tau_{\mu} + \tau (\boldsymbol{M1})^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{M1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \mu^2(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}}\right)^2 \\ &+& \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}\\ \end{eqnarray*}

that can be be plugged back into our equation as a \text{Normal}(\frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}, (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{-1}):

\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})} \right \} \right]\\ &\times& \int \exp \left[ -\frac{1}{2} [ (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}\right)^2 \right] \\ &\times& \frac{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi} ^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ \end{eqnarray*}

Now, replacing \tau_{\mu} = k \tau, we have:

\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta, \tau) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times (k \tau)^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ k \tau\mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(k \tau\mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{(k \tau + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(k \tau+ \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \int \tau^{\alpha - 1} \tau^{1/2} \tau^{N/2} \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau (k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}))^2}{ (\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}\boldsymbol{1})} \right \} \right] \frac{1}{(\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}))^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{\alpha - 1} \tau^{N/2} \\ &\times& \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} \} \right] \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \partial\tau \\ \end{eqnarray*}

where the main expression can be seen as a \text{Gamma}(N/2 + \alpha, \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ))

and:

\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta) &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \\ &\times& \frac{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} }{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} } \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{k + (\boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \\ &\times& (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{-(N/2 + \alpha)}\\ \end{eqnarray*}

This equation in log-scale gives us:

\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta)) &=& -\frac{1}{2} \log(|\boldsymbol{\Psi}|) + \frac{\log(k)}{2} - \log(k + ((\boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N/2 + \alpha)\left[ \log \beta + \log(1/2) + \log(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}) \right] \end{eqnarray*}

And the same, when written for j = 1,\dots, b nodes of a tree, would look like:

\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | k_{j}, \mu_{\mu_{j}}, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{1}{2} \log(|\boldsymbol{\Psi_{j}}|) + \frac{\log(k_{j})}{2} - \log(k_{j} + ((\boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N_{j}/2 + \alpha) [ \log \beta + \log(1/2) + \log(k_{j} \mu_{\mu_{j}}^2 + \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1} \boldsymbol{y_{j}} \\ &+& \frac{(k_{j} \mu_{\mu_{j}} \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1}\boldsymbol{M_{j} 1})^2}{k_{j} + \boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1}})] \end{eqnarray*}

2.9 Algorithm


Algorithm type: Metropolis within GIBBS for a hierachical Bayesian (single) tree

Reason: We have closed posteriors for most parameters, but not for the tree structure

Data: Target variable y, groups j = 1,\dots,m, set of features X

Result: Posterior distributions for all parameters


Initialisation;

Hyper-parameters values for \alpha, \beta, k_1, k_2;

Number of groups m;

Number of observations N =\sum_{j = 1}^{m} n_j;

Number of iterations I;

  • define \mu_{\mu} = 0, \mu^{0}, \tau^{0}, and \mu_j^{0}, j = 1,\dots, m as the initial parameter values

  • for i from 1 to I do:

    • grow a new T^{\text{new}} tree by either growing, pruning, changing or swapping a root node

    • set l_{\text{new}} = log full conditional for the new (candidate) tree

l_{\text{new}} = \sum_{l = 1}^{b_{\text{new}}} -\frac{1}{2} \log(|\boldsymbol{W}_{1,l}|) + \log(\Gamma(N_l/2 + \alpha)) -(N_l/2 + \alpha)\left[ \log \beta + \log\Big(\frac{(\mathbf{y}_l - \mathbf{W}_{0,l})^T \mathbf{W}_{1,l}^{-1} (\mathbf{y}_l - \mathbf{W}_{0,l})}{2} \Big) \right]
  • set l_{\text{old}} = log full conditional for the previous tree
l_{\text{old}} = \sum_{l = 1}^{b_{\text{old}}} -\frac{1}{2} \log(|\boldsymbol{W}_{1,l}|) + \log(\Gamma(N_l/2 + \alpha)) -(N_l/2 + \alpha)\left[ \log \beta + \log\Big(\frac{(\mathbf{y}_l - \mathbf{W}_{0,l})^T \mathbf{W}_{1,l}^{-1} (\mathbf{y}_l - \mathbf{W}_{0,l})}{2} \Big) \right]
  • set a = \exp(l_{\text{new}} - l_{\text{old}})

  • generate u \sim U[0, 1]

  • if u < a then:

    • set T = T^{\text{new}}
  • end

  • sample \mu from the posterior N(\frac{(\tau/k_1) \bar \mu m}{\tau(\frac{1}{k_2} + \frac{m}{k_1})}, (\tau(\frac{1}{k_2} + \frac{m}{k_1}) )^{-1}) (because of \bar \mu)

  • for j in 1:m do:

    • sample \mu_j from the posterior N(\frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1}, ((n_j + \frac{1}{k_1})\tau)^{-1})
  • end

  • sample \tau from the posterior \text{Gamma}\Big(n/2 + \alpha, \beta + \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2}\Big)

  • end