Chapter 2 Single tree model
2.1 Model specification
Define the following model. Suppose we have the observation of a tree node as: yij,i=1,…,nj,j=1…,m where yij is observation i in group j. There are different numbers of observations nj in each group, so for example n1 is the number of observations in group 1, etc. There are m groups. The total number of observations is n=∑mj=1nj
Then, for each tree node, suppose we have the likelihood: yij∼N(μj,τ−1)
so each group has an overall mean μj, with an overall precision term τ.
We then have a hierarchical prior distribution:
μj∼N(μ,k1(τ−1))
where k1 will be taken as a constant for simplicity, and the hyper-parameter prior distributions are:
μ∼N(0,τμ=k2(τ−1)) τ∼Ga(α,β)
Where the values k1,k2,α,β are all fixed.
2.2 Maths
- The likelihood of each tree node will be:
L=m∏j=1nj∏i=1p(yij|μj,τ)L∝τn/2exp{−τ2m∑j=1nj∑i=1(yij−μj)2}
with prior distributions:
- μj|μ,τ,k1
p(μ1,…,μm|μ,τ)∝(τ/k1)m/2exp{−τ2k1m∑j=1(μj−μ)2}
- τ|α,β
p(τ|α,β)∝τα−1exp{−βτ}
- μ|τμ=k2(τ−1)
p(μ|k2,τ)∝(τ/k2)1/2exp{−τ2k2μ2}}
and their joint distribution as:
- p(τ,μ1,…,μm,μ|y,k1,k2,τ,α,β)
p(τ,μ1,…,μm,μ|y,k1,k2,τ,α,β)∝τα−1exp{−βτ}×(τ/k1)m/2exp{−τ2k1m∑j=1(μj−μ)2}×(τ/k2)1/2exp{−τ2k2μ2}×τn/2exp{−τ2m∑j=1nj∑i=1(yij−μj)2}
2.4 Posterior for τ
- p(τ|μ1,…,μm,y,α,β,k1)
p(τ|μ1,…,μm,y,α,β,k1)∝τα−1exp{−βτ}×(τ/k1)m/2exp{−τ2k1m∑j=1(μj−μ)2}×τn/2exp{−τ2m∑j=1nj∑i=1(yij−μj)2}×(τ/k2)1/2exp{−τ2k2μ2}∝τ(n+m+1)/2+α−1exp{−τ(∑mj=1∑nji=1(yij−μj)22+β+∑mj=1(μj−μ)22k1+μ22k2)}
So τ|μ1,…,μm,y,α,β,k1,k2∼Gamma((n+m+1)/2+α,(∑mj=1∑nji=1(yij−μj)22+β+∑mj=1(μj−μ)22k1+μ22k2))
2.5 Posterior for μj,p
Q=(τ/(k1/P))m∑j=1(μj−μ)2+τm∑j=1nj∑i=1(yij−μj)2Q=τ[m∑j=1njμ2j+μ2j(k1/P)−(m∑j=12μμj(k1/P)+2ˉyjnjμj)]Q∝τ[m∑j=1(nj+1(k1/P))μ2j−2μj(μ(k1/P)+ˉyjnj)]Q∝τ[m∑j=1(nj+1(k1/P))(μj−μ/(k1/P)+ˉyjnjnj+1/(k1/P))2] So for each μj
μj|μ,y,τ,k1∼N(μ/(k1/P)+ˉyjnjnj+1/(k1/P),((nj+1(k1/P))τ)−1)
2.6 Posterior for μ
Similarly, for μ we have:
Q=[τ(k1/P)m∑j=1(μj−μ)2+τk2μ2]Q=[τ(k1/P)m∑j=1(μ2j−2μμj+μ2)+τk2μ2]Q∝[(τk2+τm(k1/P))μ2−2τ(k1/P)m∑j=1μμj]Q∝[(τk2+τm(k1/P))μ2−2τ(k1/P)μˉμm]Q∝(τ(m(k1/P)+1k2))(μ−(τm/(k1/P))ˉμτ(m(k1/P)+1k2))2
So for μ we have that the conditional distribution:
μ|μ1,…,μm,μμ,k1,k2,τ∼N((τm/(k1/P))ˉμτ(m(k1/P)+1k2),τ(m(k1/P)+1k2)−1)
2.7 A second posterior, with μj marginalised out
The following is what is used in the code.
Assuming y|τ,k1,k2∼N(0,τ−1[(k1MMT+I)+k211T])
we can do
p(μ|y,α,β,k1,k2)∝exp{−τ2(y−μ1)TΨ−1(y−μ1)}×exp{−τ2k2μ2}∝exp{−τ2(μ2(1TΨ−11+1/k2)−2μ1TΨ−1y+μ2/k2)}∝exp{−τ2[(1TΨ−11+1/k2)(μ2−2μ1TΨ−1y1TΨ−11+1/k1)}
μ|y,τ,k1,k2∼MVN(1TΨ−1y1TΨ−11+k−12,((1TΨ−11+k−12)τ)−1)
2.8 Marginal Distributions for y
Suppose we have the outcome variable: y∼MVNn(μ1n,τ−1(k1MMT+I))
with: μ∼N(μμ,τ−1μ)
And define Ψ=k1MMT+I
Then, as a ‘trick’ to estimate the mean and variance of y, we can write:
y=μ1n+[τ−1Ψ]1/2z
where z∼MVN(0,I) is a standard multivariate normal. Then we have:
E(y)=μμ1n+0=μμ1nVar(y)=Var(μ1n)+Var([τ−1Ψ]1/2)=τ−1μ1n1Tn+τ−1Ψ
Now let τ−1μ=k2τ−1, we get:
y∼MVN(μμ1,τ−1[k21n1Tn+Ψ])≡MVN(W0,τ−1W1)
We now want to marginalize this over τ∼Ga(α,β), by integrating out a Gamma distribution with:
Ga(n/2+α,β+(y−W0)TW−11(y−W0)2)
…so we get:
π(y|μμ,k1,k2)=∫(2π)−n/2τn/2|W1|−1/2exp[−τ2(y−W0)TW−11(y−W0)]τα−1exp(−βτ)∂τ
This becomes:
=(2π)−n/2|W1|−1/2Γ(n2+α)[(y−W0)TW−11(y−W0)2+β]−(n2+α)
(For examples of the evaluation of this marginal distribution, see )
2.8.1 log version of the marginal:
This equation in log-scale gives us:
\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k_1, k_2, \mu, \alpha, \beta)) &=& -\frac{N}{2} \log(2\pi) -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1}|) + \log(\Gamma(N/2 + \alpha)) \\ &-& (N/2 + \alpha)\left[ \log \Big( \beta + \frac{(\mathbf{y} - \mathbf{W}_{0})^T \mathbf{W}_{1}^{-1} (\mathbf{y} - \mathbf{W}_{0})}{2}\Big) \right] \end{eqnarray*}
And the same, when written for j = 1,\dots, b nodes of a tree, would look like:
\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | N_j, k_1, k_2, \mu, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{N_j}{2} \log(2\pi) + -\frac{1}{2} \log(|\boldsymbol{\mathbf{W}}_{1,j}|) + \log(\Gamma(N_j/2 + \alpha)) \\ &-& (N_j/2 + \alpha)\left[ \log \Big(\beta + \frac{(\mathbf{y}_j - \mathbf{W}_{0,j})^T \mathbf{W}_{1,j}^{-1} (\mathbf{y}_j - \mathbf{W}_{0,j})}{2} \Big) \right] \end{eqnarray*}
2.8.2 (Old) Marginal Distributions
This is only present here for the record.
y_{ij} \sim N(\mu_j, \tau^{-1}) \mu_j \sim N(\mu, k\tau^{-1}) \mu \sim N(\mu_\mu,\tau_\mu^{-1}) \tau \sim Ga(\alpha, \beta)
with N = \sum_{j = 1}^{m} n_j. Define \mathbf{M} to be an N\times m binary matrix which allocates each observation to a group.
Writing things out in matrix format:
\begin{equation} \boldsymbol{y} = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_m m} \end{bmatrix} \end{equation}
\begin{equation} \boldsymbol{M} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ \vdots & \vdots & \vdots \\ 0 & 0 & 1 \end{bmatrix} \end{equation}
\begin{equation} \boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m \end{bmatrix} \end{equation}
then \boldsymbol{y} \sim MVN_{N}( \boldsymbol{M \lambda}, \tau^{-1} \boldsymbol{I}) and \boldsymbol{\lambda} \sim MVN_{m}( \mu \boldsymbol{1}, k \tau^{-1} \boldsymbol{I}).
\begin{equation} E[\boldsymbol{y} | \mu, \tau] = E_\lambda E_y[\boldsymbol{y} | \boldsymbol{\lambda}, \mu, \tau] = \boldsymbol{M} E[\boldsymbol{\lambda}] = \mu \boldsymbol{M1} \\ Var[\boldsymbol{y} | \mu, \tau] = Var[\boldsymbol{M} \boldsymbol{\lambda}] + \tau^{-1} \boldsymbol{I} = \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I} \end{equation}
so \boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \boldsymbol{M} \boldsymbol{M}^{T}(k\tau^{-1}) + \tau^{-1} \boldsymbol{I})
\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , k\tau^{-1} + \tau^{-1} \boldsymbol{I}), \text{ since } \boldsymbol{M} \boldsymbol{M}^{T} = \boldsymbol{I}
\boldsymbol{y} | \mu, \tau, k, \tau_{\mu} \sim MVN_{N}(\mu \boldsymbol{M1} , \tau^{-1} (k + \boldsymbol{I}) )
We now use this as the starting point and integrate out \mu and \tau. The equation we end up with should be a function of M, k, \tau_\mu, \alpha, and \beta.
To start, define: \Psi = (k + \boldsymbol{I}) so that y|\ldots \sim MVN(\mu \boldsymbol{M1}, \tau^{-1} \boldsymbol{\Psi}). Then we obtain:
\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \exp \left[ -\frac{\tau_\mu}{2} (\mu - \mu_\mu)^2 \right]\\ &\times& \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{\tau}{2} \left\{ (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1}) \right\} \right] \partial\mu \partial\tau \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ \end{eqnarray*}
The inner expression can be rewritten as:
\begin{eqnarray*} Q &=& [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \\ &=& \mu^2(\tau_{\mu} + \tau (\boldsymbol{M1})^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{M1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \mu^2(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) - 2\mu (\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}) + \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} \\ &=& \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}}\right)^2 \\ &+& \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}\\ \end{eqnarray*}
that can be be plugged back into our equation as a \text{Normal}(\frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})}, (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{-1}):
\begin{eqnarray*} \pi(\boldsymbol{y} | k, \tau_\mu, \mu_\mu, \alpha, \beta) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \\ &\times& \int \exp \left[ -\frac{1}{2} [\tau_\mu (\mu - \mu_\mu)^2 + \tau (\boldsymbol{y} - \mu \boldsymbol{M1})^T \boldsymbol{\Psi}^{-1} (\boldsymbol{y} - \mu \boldsymbol{M1})] \right] \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2}\partial\tau \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})} \right \} \right]\\ &\times& \int \exp \left[ -\frac{1}{2} [ (\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}) \left(\mu - \frac{\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}}{\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}\right)^2 \right] \\ &\times& \frac{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\mu \\ &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times \tau_\mu^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ \tau_{\mu} \mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau_{\mu} \mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi} ^{-1} \boldsymbol{M1})^2}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(\tau_{\mu} + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ \end{eqnarray*}
Now, replacing \tau_{\mu} = k \tau, we have:
\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta, \tau) &=& \int \tau^{\alpha - 1} \exp [ -\beta \tau] \times (k \tau)^{1/2} \tau^{N/2} |\Psi|^{-1/2} \exp \left[ -\frac{1}{2}\{ k \tau\mu_{\mu}^2 + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(k \tau\mu_{\mu} + \tau \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{(k \tau + \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})} \right \} \right] \frac{1}{(k \tau+ \tau \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \int \tau^{\alpha - 1} \tau^{1/2} \tau^{N/2} \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y}\} \right] \\ &\times& \exp \left[ -\frac{1}{2} \left \{ \frac{(\tau (k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{M1}))^2}{ (\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}\boldsymbol{1})} \right \} \right] \frac{1}{(\tau (k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1} \boldsymbol{1}))^{1/2}} \thinspace \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{\alpha - 1} \tau^{N/2} \\ &\times& \exp [ -\beta \tau] \times \exp \left[ -\frac{\tau}{2}\{ k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} \} \right] \thinspace \thinspace \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1} \boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \partial\tau \\ \end{eqnarray*}
where the main expression can be seen as a \text{Gamma}(N/2 + \alpha, \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ))
and:
\begin{eqnarray*} \pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta) &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{(k + \boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \int \tau^{N/2 + \alpha - 1} \\ &\times& \exp [ -\tau \{ \beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}} ) \}] \thinspace \thinspace \\ &\times& \frac{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} }{ (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{N/2 + \alpha} } \partial\tau \\ &=& |\Psi|^{-1/2} k^{1/2} \frac{1}{k + (\boldsymbol{1}^{T} \boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2}} \\ &\times& (\beta + \frac{1}{2}(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}))^{-(N/2 + \alpha)}\\ \end{eqnarray*}
This equation in log-scale gives us:
\begin{eqnarray*} \log(\pi(\boldsymbol{y} | k, \mu_\mu, \alpha, \beta)) &=& -\frac{1}{2} \log(|\boldsymbol{\Psi}|) + \frac{\log(k)}{2} - \log(k + ((\boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N/2 + \alpha)\left[ \log \beta + \log(1/2) + \log(k \mu_{\mu}^2 + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1} \boldsymbol{y} + \frac{(k \mu_{\mu} + \boldsymbol{y}^T \boldsymbol{\Psi}^{-1}\boldsymbol{M1})^2}{k + \boldsymbol{1}^{T}\boldsymbol{\Psi}^{-1}\boldsymbol{1}}) \right] \end{eqnarray*}
And the same, when written for j = 1,\dots, b nodes of a tree, would look like:
\begin{eqnarray*} \sum_{j = 1}^{b} \log(\pi(\boldsymbol{y_{j}} | k_{j}, \mu_{\mu_{j}}, \alpha, \beta)) &=& \sum_{j = 1}^{b} -\frac{1}{2} \log(|\boldsymbol{\Psi_{j}}|) + \frac{\log(k_{j})}{2} - \log(k_{j} + ((\boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1})^{1/2})) \\ &-& (N_{j}/2 + \alpha) [ \log \beta + \log(1/2) + \log(k_{j} \mu_{\mu_{j}}^2 + \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1} \boldsymbol{y_{j}} \\ &+& \frac{(k_{j} \mu_{\mu_{j}} \boldsymbol{y_{j}}^T \boldsymbol{\Psi_{j}}^{-1}\boldsymbol{M_{j} 1})^2}{k_{j} + \boldsymbol{1}^{T}\boldsymbol{\Psi_{j}}^{-1}\boldsymbol{1}})] \end{eqnarray*}
2.9 Algorithm
Algorithm type: Metropolis within GIBBS for a hierachical Bayesian (single) tree
Reason: We have closed posteriors for most parameters, but not for the tree structure
Data: Target variable y, groups j = 1,\dots,m, set of features X
Result: Posterior distributions for all parameters
Initialisation;
Hyper-parameters values for \alpha, \beta, k_1, k_2;
Number of groups m;
Number of observations N =\sum_{j = 1}^{m} n_j;
Number of iterations I;
define \mu_{\mu} = 0, \mu^{0}, \tau^{0}, and \mu_j^{0}, j = 1,\dots, m as the initial parameter values
for i from 1 to I do:
grow a new T^{\text{new}} tree by either growing, pruning, changing or swapping a root node
set l_{\text{new}} = log full conditional for the new (candidate) tree
- set l_{\text{old}} = log full conditional for the previous tree
set a = \exp(l_{\text{new}} - l_{\text{old}})
generate u \sim U[0, 1]
if u < a then:
- set T = T^{\text{new}}
end
sample \mu from the posterior N(\frac{(\tau/k_1) \bar \mu m}{\tau(\frac{1}{k_2} + \frac{m}{k_1})}, (\tau(\frac{1}{k_2} + \frac{m}{k_1}) )^{-1}) (because of \bar \mu)
for j in 1:m do:
- sample \mu_j from the posterior N(\frac{\mu/k_1 + \bar y_j n_j}{n_j + 1/k_1}, ((n_j + \frac{1}{k_1})\tau)^{-1})
end
sample \tau from the posterior \text{Gamma}\Big(n/2 + \alpha, \beta + \frac{(\mathbf{y} - \mathbf{W}_0)^T \mathbf{W}_1^{-1} (\mathbf{y} - \mathbf{W}_0)}{2}\Big)
end