10.4 Calculating the marginal likelihood
The BIC is an asymptotic approximation of the marginal likelihood, and consequently, it is used to obtain the Bayes factors. However, this method has limitations in applications with moderate and small sample sizes (Alan E. Gelfand and Dey 1994). Therefore, other methods are available to calculate the Bayes factors when there is no analytical solution for the marginal likelihood.
Observe that calculating the Bayes factor with respect to a reference model (M0) helps to obtain the posterior model probabilities,
\begin{align*} \pi(\mathcal{M}_j |\boldsymbol{y})&=\frac{p(\boldsymbol{y} | \mathcal{M}_j)\pi(\mathcal{M}_j)}{\sum_{m=1}^{M}p(\boldsymbol{y} | \mathcal{M}_m)\pi(\mathcal{M}_m)}\\ &=\frac{p(\boldsymbol{y} | \mathcal{M}_j)\pi(\mathcal{M}_j)/p(\boldsymbol{y} | \mathcal{M}_0)}{\sum_{m=1}^{M}p(\boldsymbol{y} | \mathcal{M}_m)\pi(\mathcal{M}_m)/p(\boldsymbol{y} | \mathcal{M}_0)}\\ &=\frac{BF_{j0}\times\pi(\mathcal{M}_j)}{\sum_{m=1}^{M}BF_{l0}\times\pi(\mathcal{M}_l)}. \end{align*}
Thus, \pi(\mathcal{M}_j |\boldsymbol{y})=\frac{BF_{j0}}{\sum_{m=1}^{M}BF_{l0}} assuming equal prior model probabilities.
In addition, it has been established in many settings that the Bayes factor is consistent. That is, the probability of identifying the true data generating process converges to 1 as the sample size increases to infinity. Alternatively, it asymptotically identifies the model that minimizes the Kullback-Leibler divergence with respect to the data generating process when this process is not part of the models under consideration (Siddhartha Chib and Kuffner 2016; Stephen G. Walker 2004b; Stephen G. Walker 2004a).60
10.4.1 Savage-Dickey density ratio
The Savage-Dickey density ratio is a way to calculate the Bayes factors when we compare nested models with particular priors (James M. Dickey 1971; Verdinelli and Wasserman 1995). In particular, given the parameter space \boldsymbol{\theta}=(\boldsymbol{\omega}^{\top}, \boldsymbol{\psi}^{\top})^{\top}\in \boldsymbol{\Theta}=\boldsymbol{\Omega}\times \boldsymbol{\Psi}, where we wish to test the null hypothesis H_0:\boldsymbol{\omega}=\boldsymbol{\omega}_0 (model \mathcal{M}_1) versus H_1:\boldsymbol{\omega}\neq \boldsymbol{\omega}_0 (model \mathcal{M}_2), if \pi(\boldsymbol{\psi}|\boldsymbol{\omega}_0,\mathcal{M}_2)=\pi(\boldsymbol{\psi}|\mathcal{M}_1),61 then the Bayes factor comparing \mathcal{M}_1 versus \mathcal{M}_2 is
\begin{equation} BF_{12}=\frac{\pi(\boldsymbol{\omega}=\boldsymbol{\omega}_0|\boldsymbol{y},\mathcal{M}_2)}{\pi(\boldsymbol{\omega}=\boldsymbol{\omega}_0|\mathcal{M}_2)}, \tag{10.1} \end{equation} where \pi(\boldsymbol{\omega}=\boldsymbol{\omega}_0|\boldsymbol{y},\mathcal{M}_2) and \pi(\boldsymbol{\omega}=\boldsymbol{\omega}_0|\mathcal{M}_2) are the posterior and prior densities of \boldsymbol{\omega} under \mathcal{M}_2 evaluated at \boldsymbol{\omega}_0 (see Verdinelli and Wasserman (1995)).
Equation (10.1) is called the Savage-Dickey density ratio. A nice feature is that just requires estimation of model \mathcal{M}_2, and evaluation of the prior and posterior densities. This means no evaluation of the marginal likelihood (G. M. Koop 2003).
10.4.2 Chib’s methods
Another popular method to calculate the marginal likelihood is given by Siddhartha Chib (1995) and Siddhartha Chib and Jeliazkov (2001). The former is an algorithm to calculate the marginal likelihood from the posterior draws of the Gibbs sampling algorithm, and the latter calculates the marginal likelihood from the posterior draws of the Metropolis-Hastings algorithm.
The point of departure in Siddhartha Chib (1995) is the identity \begin{align*} \pi(\boldsymbol{\theta}^*|\boldsymbol{y},\mathcal{M}_m)=\frac{p(\boldsymbol{y}|\boldsymbol{\theta}^*,\mathcal{M}_m)\times\pi(\boldsymbol{\theta}^*|\mathcal{M}_m)}{p(\boldsymbol{y}|\mathcal{M}_m)}, \end{align*} where \boldsymbol{\theta}^* is a particular value of \boldsymbol{\theta} of high probability, for instance, the mode. This implies that \begin{align*} p(\boldsymbol{y}|\mathcal{M}_m)=\frac{p(\boldsymbol{y}|\boldsymbol{\theta}^*,\mathcal{M}_m)\times\pi(\boldsymbol{\theta}^*|\mathcal{M}_m)}{\pi(\boldsymbol{\theta}^*|\boldsymbol{y},\mathcal{M}_m)}. \end{align*} We can easily calculate the numerator of this expression. However, the critical point in this expression is to calculate the denominator as we know \pi(\boldsymbol{\theta}^*|\boldsymbol{y},\mathcal{M}_m) up to a normalizing constant. We can calculate this from the posterior draws. Assume that \boldsymbol{\theta}=[\boldsymbol{\theta}^{\top}_1 \ \boldsymbol{\theta}^{\top}_2]^{\top}, then \pi(\boldsymbol{\theta}^*|\boldsymbol{y},\mathcal{M}_m)=\pi(\boldsymbol{\theta}^*_1|\boldsymbol{\theta}^*_2,\boldsymbol{y},\mathcal{M}_m)\times \pi(\boldsymbol{\theta}^*_2|\boldsymbol{y},\mathcal{M}_m). We have the first term because in the Gibbs sampling algorithm the posterior conditional distributions are available. The second is
\begin{align*} \pi(\boldsymbol{\theta}^*_2|\boldsymbol{y},\mathcal{M}_m)&=\int_{\boldsymbol{\Theta}_1}\pi(\boldsymbol{\theta}_1,\boldsymbol{\theta}^*_2|\boldsymbol{y},\mathcal{M}_m)d\boldsymbol{\theta}_1\\ &=\int_{\boldsymbol{\Theta}_1}\pi(\boldsymbol{\theta}^*_2|\boldsymbol{\theta}_1,\boldsymbol{y},\mathcal{M}_m)\pi(\boldsymbol{\theta}_1|\boldsymbol{y},\mathcal{M}_m)d\boldsymbol{\theta}_1\\ &\approx \frac{1}{S}\sum_{s=1}^S \pi(\boldsymbol{\theta}^*_2|\boldsymbol{\theta}^{(s)}_1,\boldsymbol{y},\mathcal{M}_m), \end{align*}
where \boldsymbol{\theta}^{(s)}_1 are the posterior draws of \boldsymbol{\theta}_1 from the Gibbs sampling algorithm.
The generalization to more blocks can be seen in Siddhartha Chib (1995) and Greenberg (2012). In addition, the extension to the Metropolis-Hastings algorithm can be seen in Siddhartha Chib and Jeliazkov (2001), and Greenberg (2012).
10.4.3 Gelfand-Dey method
We can use the Gelfand-Dey method (Alan E. Gelfand and Dey 1994) when we want to calculate the Bayes factor to compare non-nested models, models where the Savage-Dickey density ratio is hard to calculate, or the Chib’s methods are difficult to implement. The Gelfand-Dey method is very general, and can be used in virtually any model (G. M. Koop 2003).
Given a probability density function q(\boldsymbol{\theta}), whose support is in \boldsymbol{\Theta}, then \begin{align*} \mathbb{E}\left[\frac{q(\boldsymbol{\theta})}{\pi(\boldsymbol{\theta}|\mathcal{M}_m)p(\boldsymbol{y}|\boldsymbol{\theta}_m,\mathcal{M}_m)}\biggr\rvert \boldsymbol{y},\mathcal{M}_m\right]&=\frac{1}{p(\boldsymbol{y}|\mathcal{M}_m)}, \end{align*}
where the expected value is with respect to the posterior distribution given the model \mathcal{M}_m (see Exercise 12).
The critical point is to select a good q(\boldsymbol{\theta}). John Geweke (1999) recommends to use q(\boldsymbol{\theta}) equal to a truncated multivariate normal density function with mean and variance equal to the posterior mean (\hat{\boldsymbol{\theta}}) and variance (\hat{\boldsymbol{\Sigma}}) of \boldsymbol{\theta}. The truncation region is \hat{\boldsymbol{\Theta}}=\left\{\boldsymbol{\theta}:(\boldsymbol{\theta}-\hat{\boldsymbol{\theta}})^{\top}\hat{\boldsymbol{\Sigma}}^{-1}(\boldsymbol{\theta}-\hat{\boldsymbol{\theta}})\leq \chi_{1-\alpha}^2(K)\right\}, where \chi_{1-\alpha}^2(K) is the (1-\alpha) percentile of the Chi-squared distribution with K degrees of freedom, K is the dimension of \boldsymbol{\theta}. We can pick small values of \alpha, for instance, \alpha=0.01.
Observe that \begin{align*} \mathbb{E}\left[\frac{q(\boldsymbol{\theta})}{\pi(\boldsymbol{\theta}|\mathcal{M}_m)p(\boldsymbol{y}|\boldsymbol{\theta}_m,\mathcal{M}_m)}\biggr\rvert \boldsymbol{y},\mathcal{M}_m\right]&\approx \frac{1}{S}\sum_{s=1}^S \left[\frac{q(\boldsymbol{\theta}^{(s)})}{\pi(\boldsymbol{\theta}^{(s)}|\mathcal{M}_m)p(\boldsymbol{y}|\boldsymbol{\theta}^{(s)}_m,\mathcal{M}_m)}\right], \end{align*} where \boldsymbol{\theta}^{(s)}_m are draws from the posterior distribution.
Observe that we can calculate the marginal likelihoods of the models in Chapters 6, 7, 8 and 9 using the Chib’s methods and the Gelfand-Dickey method.
Example: Simulation exercise
Let’s check the performance of the Savage-Dickey density ratio, Chib’s method and the Gelfand-Dey method to calculate the Bayes factor in a setting where we can obtain the analytical solution for the marginal likelihood. In particular, we will consider the Gaussian linear model with a conjugate prior (see Section 3.3).
Assume that the data generating process is given by
y_{i} = 0.7 + 0.3x_{i1} + 0.7x_{i2} - 0.2x_{i3} + 0.2x_{i4} + \mu_i,
where x_{i1} \sim B(0.3), x_{ik} \sim N(0,1), for k = 2, \dots, 4, and \mu_i \sim N(0, 1), for i = 1, 2, \dots, 500. Let us set H_0: \beta_5 = 0 (model \mathcal{M}_1) versus H_1: \beta_5 \neq 0 (model \mathcal{M}_2).
We assume that \boldsymbol{\beta}_{m0} = \boldsymbol{0}_{m0}, \boldsymbol{B}_{m0} = 0.5 \boldsymbol{I}_{m}, \alpha_0 = \delta_0 = 4. The dimensions of \boldsymbol{0}_{m0} and \boldsymbol{I}_m are 4 for model \mathcal{M}_1 and 5 for model \mathcal{M}_2. In addition, we assume equal prior probabilities for both models.
We know from Section 3.3 that the marginal likelihood is \begin{align*} p(\bf{y}|\mathcal{M}_m)&=\frac{\delta_{m0}^{\alpha_{m0}/2}}{\delta_{mn}^{\alpha_{mn}/2}}\frac{|{\bf{B}}_{mn}|^{1/2}}{|{\bf{B}}_{m0}|^{1/2}}\frac{\Gamma(\alpha_{mn}/2)}{\Gamma(\alpha_{m0}/2)}, \end{align*} where {{\boldsymbol{B}}}_{mn} = ({\boldsymbol{B}}_{m0}^{-1} + {\boldsymbol{X}}_m^{\top}{\boldsymbol{X}}_m)^{-1}, \boldsymbol{\beta}_{mn} = {{\bf{B}}}_{mn}({\boldsymbol{B}}_{m0}^{-1}\boldsymbol{\beta}_{m0} + {\boldsymbol{X}}_m^{\top}{\boldsymbol{X}}_m\hat{\boldsymbol{\beta}}_m), \alpha_{mn}=\alpha_{m0}+N, and \delta_{mn}=\delta_{m0}+({\boldsymbol{y}}-{\boldsymbol{X}}_m\hat{\boldsymbol{\beta}}_m)^{\top}({\boldsymbol{y}}-{\boldsymbol{X}}_m\hat{\boldsymbol{\beta}}_m)+(\hat{\boldsymbol{\beta}}_m-\boldsymbol{\beta}_{m0})^{\top}(({\boldsymbol{X}_m}^{\top}{\boldsymbol{X}_m})^{-1}+{\boldsymbol{B}}_{m0})^{-1}(\hat{\boldsymbol{\beta}}_m-\boldsymbol{\beta}_{m0}), m=1,2 are the indices of the models.
The log marginal likelihoods for models \mathcal{M}_1 and \mathcal{M}_2 are -751.72 and -740.79, respectively. This implies a 2\times\log(BF_{21})=21.85 which means positive evidence against model \mathcal{M}_1.
We have different ways to calculate the Bayes factor using the Savage-Dickey density ratio in this example because we know that the marginal prior and marginal posterior distributions of \beta_5 have analytical solutions. In addition, we can use the posterior draws of \sigma^2 to evaluate the conditional prior and conditional posterior distributions at \beta_5=0. We show in the following code the latter approach, as it is more general than using analytical solutions, which are not always available.
We know that the conditional posterior distribution of \beta_5 is N(\beta_{5n}, \sigma \boldsymbol{B}_{55n}), where \beta_{5n} is the 5th element of \boldsymbol{\beta}_n, and \boldsymbol{B}_{55n} is the element 5,5 of \boldsymbol{B}_n. Then, \begin{align*} \pi(\beta_5=0|\boldsymbol{y}, \mathcal{M}_2) &= \int_{\mathcal{R}^+} \pi(\beta_5=0|\boldsymbol{y}, \sigma^2) \pi(\sigma^2|\boldsymbol{y}) \, d\sigma^2 \\ &\approx \frac{1}{S} \sum_{s=1}^S \pi(\beta_5=0|\boldsymbol{y}, \sigma^{2(s)}), \end{align*}
where \sigma^{2(s)} are draws from the posterior distribution of \sigma^2.
We can follow the same logic to obtain an approximation to \pi(\beta_5=0|\mathcal{M}_2) by sampling draws from the prior distribution of \sigma^2.
We obtain 2 \times \log(BF_{21}) = 21.85 using the Savage-Dickey density ratio, which is the same value as the analytic solution using the marginal likelihoods.
We calculate the log marginal likelihood using the Chib’s method taking into account that \begin{align*} \log(p(\boldsymbol{y}|\mathcal{M}_m))&=\log(p(\boldsymbol{y}|\boldsymbol{\theta}^*,\mathcal{M}_m))+\log(\pi(\boldsymbol{\theta}^*|\mathcal{M}_m))-\log(\pi(\boldsymbol{\theta}^*|\boldsymbol{y},\mathcal{M}_m)),\\ \end{align*} where p(\boldsymbol{y}|\boldsymbol{\theta}^*,\mathcal{M}_m) is the value of a normal density with mean \boldsymbol{X}_m\boldsymbol{\beta}_{m}^* and variance \sigma^{2*}_m\boldsymbol{I}_N evaluated at \boldsymbol{y}. In addition, \log(\pi(\boldsymbol{\theta}^*|\mathcal{M}_m))=\log(\pi(\boldsymbol{\beta}_m^*|\sigma^{2*}_m))+\log(\pi(\sigma^{2*}_m)), where the first term is the density of a normal with mean \boldsymbol{\beta}_{m0} and variance matrix \sigma^{2*}\boldsymbol{B}_{m0} evaluated at \boldsymbol{\beta}_m^*, and the second term is the density of an inverse-gamma with parameters \alpha_{m0}/2 and \delta_{m0}/2 evaluated at \sigma^{2*}_m. Finally, the third term in the right hand of the previous expression is \log(\pi(\boldsymbol{\theta}^*|\boldsymbol{y},\mathcal{M}_m))=\log(\pi(\boldsymbol{\beta}_m^*|\sigma^{2*}_m,\boldsymbol{y}))+\log(\pi(\sigma^{2*}_m|\boldsymbol{y})), where the first term is the density of a normal with mean \boldsymbol{\beta}_{mn} and variance matrix \sigma^{2*}_m\boldsymbol{B}_{mn} evaluated at \boldsymbol{\beta}_m^*, and the second term is the density of an inverse-gamma with parameters \alpha_{mn}/2 and \delta_{mn}/2 evaluated at \sigma^{2*}_m. We use the modes of the posterior draws of \boldsymbol{\beta}_m and \sigma^2_m as reference values.
We get the same value, up to two decimals, for the log marginal likelihood of the restricted and unrestricted models using the Chib’s method and the analytical expression. Thus, 2\times\log(BF_{21})=21.85, that is, positive evidence against model \mathcal{M}_1.
We calculate the log marginal likelihood using the Gelfand-Dey method taking into account that \begin{align*} \log\left[\frac{q(\boldsymbol{\theta}^{(s)})}{\pi(\boldsymbol{\theta}^{(s)}|\mathcal{M}_m)p(\boldsymbol{y}|\boldsymbol{\theta}^{(s)}_m,\mathcal{M}_m)}\right]&=\log(q(\boldsymbol{\theta}^{(s)}))-\log(\pi(\boldsymbol{\theta}^{(s)}|\mathcal{M}_m))-\log(p(\boldsymbol{y}|\boldsymbol{\theta}^{(s)}_m,\mathcal{M}_m)), \end{align*} where q(\boldsymbol{\theta}^{(s)}) is the truncated multivariate normal density of Subsection @ref(sec10_4_3) evaluated at \boldsymbol{\theta}^{(s)}=[\boldsymbol{\beta}^{(s)\top} \ \sigma^{2(s)}]^{\top}, which is the s-th posterior draw of the Gibbs sampling algorithm, such that \boldsymbol{\theta}^{(s)} satisfies the truncation restriction. \log(\pi(\boldsymbol{\theta}^{(s)}|\mathcal{M}_m))=\log(\pi(\boldsymbol{\beta}_m^{(s)}|\sigma^{2(s)}_m))+\log(\pi(\sigma^{2(s)}_m)), where the first term is the density of a normal with mean \boldsymbol{\beta}_{m0} and variance matrix \sigma^{2(s)}\boldsymbol{B}_{m0} evaluated at \boldsymbol{\beta}_m^{(s)}, and the second term is the density of an inverse-gamma with parameters \alpha_{m0}/2 and \delta_{m0}/2 evaluated at \sigma^{2(s)}_m. The third term p(\boldsymbol{y}|\boldsymbol{\theta}^{(s)},\mathcal{M}_m) is the value of a normal density with mean \boldsymbol{X}_m\boldsymbol{\beta}_{m}^{(s)} and variance \sigma^{2(s)}_m\boldsymbol{I}_N evaluated at \boldsymbol{y}.
The log marginal likelihoods of the restricted and unrestricted models using the Gelfand-Dey method are -751.79 and -740.89, respectively. This implies 2\times \log(BF_{21})=21.81, which is positive evidence in favor of the unrestricted model.
We see in this example that these methods give very good approximations to the true marginal likelihoods. However, the Savage-Dickey density ratio and Chib’s method performed slightly better than the Gelfand-Dey method. In addition, the computational demand of the Gelfand-Dey method is by far the largest. This is because the Gelfand-Dey method requires many evaluations based on the posterior draws. However, we should keep in mind that the Gelfand-Dey method is more general.
The following code shows how to do all these calculations.
rm(list = ls()); set.seed(010101)
N <- 500; K <- 5; K2 <- 3
B <- c(0.7, 0.3, 0.7, -0.2, 0.2)
X1 <- rbinom(N, 1, 0.3)
X2 <- matrix(rnorm(K2*N), N, K2)
X <- cbind(1, X1, X2)
Y <- X%*%B + rnorm(N, 0, sd = 1)
# Hyperparameters
d0 <- 4; a0 <- 4
b0 <- rep(0, K); cOpt <- 0.5
an <- N + a0; B0 <- cOpt*diag(K)
Bn <- solve(solve(B0)+t(X)%*%X); bhat <- solve(t(X)%*%X)%*%t(X)%*%Y
bn <- Bn%*%(solve(B0)%*%b0+t(X)%*%X%*%bhat)
dn <- as.numeric(d0 + t(Y-X%*%bhat)%*%(Y-X%*%bhat)+t(bhat - b0)%*%solve(solve(t(X)%*%X)+B0)%*%(bhat - b0))
Hn <- as.matrix(Matrix::forceSymmetric(dn*Bn/an))
S <- 10000
LogMarLikLM <- function(X, c0){
K <- dim(X)[2]
N <- dim(X)[1]
# Hyperparameters
B0 <- c0*diag(K)
b0 <- rep(0, K)
# Posterior parameters
bhat <- solve(t(X)%*%X)%*%t(X)%*%Y
# Force this matrix to be symmetric
Bn <- as.matrix(Matrix::forceSymmetric(solve(solve(B0) + t(X)%*%X)))
bn <- Bn%*%(solve(B0)%*%b0 + t(X)%*%X%*%bhat)
dn <- as.numeric(d0 + t(Y)%*%Y+t(b0)%*%solve(B0)%*%b0-t(bn)%*%solve(Bn)%*%bn)
an <- a0 + N
# Log marginal likelihood
logpy <- (N/2)*log(1/pi)+(a0/2)*log(d0)-(an/2)*log(dn) + 0.5*log(det(Bn)/det(B0)) + lgamma(an/2)-lgamma(a0/2)
return(-logpy)
}
LogMarM2 <- -LogMarLikLM(X = X, c0 = cOpt)
LogMarM1 <- -LogMarLikLM(X = X[,1:4], c0 = cOpt)
BF12 <- exp(LogMarM1-LogMarM2)
BF12; 1/BF12
## [1] 1.79514e-05
## [1] 55705.95
## [1] 21.85568
# Savage-Dickey density ratio
# Posterior evaluation
Brest <- 0
sig2P <- invgamma::rinvgamma(S, shape = an/2, rate = dn/2)
PostRestCom <- mean(sapply(sig2P, function(x){dnorm(Brest, mean = bn[5], sd = (x*Bn[5,5])^0.5, log = FALSE)}))
# Prior evaluation
sig2 <- invgamma::rinvgamma(S, shape = a0/2, rate = d0/2)
PriorRestCom <- mean(sapply(sig2, function(x){dnorm(Brest, mean = 0, sd = (x*cOpt)^0.5, log = FALSE)}))
# Bayes factor
BF12SD <- PostRestCom/PriorRestCom
2*log(1/BF12SD)
## [1] 21.85275
# Chib's method
sig2Post <- MCMCpack::rinvgamma(S,an/2,dn/2)
BetasGibbs <- sapply(1:S, function(s){MASS::mvrnorm(n = 1, mu = bn, Sigma = sig2Post[s]*Bn)})
# Mode function for continuous data
mode_continuous <- function(x){
density_est <- density(x)
mode_value <- density_est$x[which.max(density_est$y)]
return(mode_value)
}
# Unrestricted model
BetasMode <- apply(BetasGibbs, 1, mode_continuous)
Sigma2Mode <- mode_continuous(sig2Post)
VarModel <- Sigma2Mode*diag(N)
MeanModel <- X%*%BetasMode
LogLik <- mvtnorm::dmvnorm(c(Y), mean = MeanModel, sigma = VarModel, log = TRUE, checkSymmetry = TRUE)
LogPrior <- mvtnorm::dmvnorm(BetasMode, mean = rep(0, K), sigma = Sigma2Mode*cOpt*diag(K), log = TRUE, checkSymmetry = TRUE)+log(MCMCpack::dinvgamma(Sigma2Mode, a0/2, d0/2))
LogPost1 <- mvtnorm::dmvnorm(BetasMode, mean = bn, sigma = Sigma2Mode*Bn, log = TRUE, checkSymmetry = TRUE)
LogPost2 <- log(MCMCpack::dinvgamma(Sigma2Mode, an/2, dn/2))
LogMarLikChib <- LogLik + LogPrior -(LogPost1 + LogPost2)
# Restricted model
anRest <- N + a0; XRest <- X[,-5]
KRest <- dim(XRest)[2]; B0Rest <- cOpt*diag(KRest)
BnRest <- solve(solve(B0Rest)+t(XRest)%*%XRest)
bhatRest <- solve(t(XRest)%*%XRest)%*%t(XRest)%*%Y
b0Rest <- rep(0, KRest)
bnRest <- BnRest%*%(solve(B0Rest)%*%b0Rest+t(XRest)%*%XRest%*%bhatRest)
dnRest <- as.numeric(d0 + t(Y-XRest%*%bhatRest)%*%(Y-XRest%*%bhatRest)+t(bhatRest - b0Rest)%*%solve(solve(t(XRest)%*%XRest)+B0Rest)%*%(bhatRest - b0Rest))
sig2PostRest <- MCMCpack::rinvgamma(S,anRest/2,dnRest/2)
BetasGibbsRest <- sapply(1:S, function(s){MASS::mvrnorm(n = 1, mu = bnRest, Sigma = sig2PostRest[s]*BnRest)})
BetasModeRest <- apply(BetasGibbsRest, 1, mode_continuous)
Sigma2ModeRest <- mode_continuous(sig2PostRest)
VarModelRest <- Sigma2ModeRest*diag(N)
MeanModelRest <- XRest%*%BetasModeRest
LogLikRest <- mvtnorm::dmvnorm(c(Y), mean = MeanModelRest, sigma = VarModelRest, log = TRUE, checkSymmetry = TRUE)
LogPriorRest <- mvtnorm::dmvnorm(BetasModeRest, mean = rep(0, KRest), sigma = Sigma2ModeRest*cOpt*diag(KRest), log = TRUE, checkSymmetry = TRUE)+log(MCMCpack::dinvgamma(Sigma2ModeRest, a0/2, d0/2))
LogPost1Rest <- mvtnorm::dmvnorm(BetasModeRest, mean = bnRest, sigma = Sigma2ModeRest*BnRest, log = TRUE, checkSymmetry = TRUE)
LogPost2Rest <- log(MCMCpack::dinvgamma(Sigma2ModeRest, anRest/2, dnRest/2))
LogMarLikChibRest <- LogLikRest + LogPriorRest -(LogPost1Rest + LogPost2Rest)
BFChibs <- exp(LogMarLikChibRest-LogMarLikChib)
BFChibs; 1/BFChibs; 2*log(1/BFChibs)
## [1] 1.79514e-05
## [1] 55705.95
## [1] 21.85568
# Gelfand-Dey method
GDmarglik <- function(ids, X, Betas, MeanThetas, VarThetas, sig2Post){
K <- dim(X)[2]; Thetas <- c(Betas[ids,], sig2Post[ids])
Lognom <- (1/(1-alpha))*mvtnorm::dmvnorm(Thetas, mean = MeanThetas, sigma = VarThetas, log = TRUE, checkSymmetry = TRUE)
Logden1 <- mvtnorm::dmvnorm(Betas[ids,], mean = rep(0, K), sigma = sig2Post[ids]*cOpt*diag(K), log = TRUE, checkSymmetry = TRUE) + log(MCMCpack::dinvgamma(sig2Post[ids], a0/2, d0/2))
VarModel <- sig2Post[ids]*diag(N)
MeanModel <- X%*%Betas[ids,]
Logden2 <- mvtnorm::dmvnorm(c(Y), mean = MeanModel, sigma = VarModel, log = TRUE, checkSymmetry = TRUE)
LogGDid <- Lognom - Logden1 - Logden2
return(LogGDid)
}
sig2Post <- MCMCpack::rinvgamma(S,an/2,dn/2)
Betas <- LaplacesDemon::rmvt(S, bn, Hn, an)
Thetas <- cbind(Betas, sig2Post)
MeanThetas <- colMeans(Thetas); VarThetas <- var(Thetas)
iVarThetas <- solve(VarThetas)
ChiSQ <- sapply(1:S, function(s){(Thetas[s,]-MeanThetas)%*%iVarThetas%*%(Thetas[s,]-MeanThetas)})
alpha <- 0.01; criticalval <- qchisq(1-alpha, K + 1)
idGoodThetas <- which(ChiSQ <= criticalval)
pb <- winProgressBar(title = "progress bar", min = 0, max = S, width = 300)
InvMargLik2 <- NULL
for(s in idGoodThetas){
LogInvs <- GDmarglik(ids = s, X = X, Betas = Betas, MeanThetas = MeanThetas, VarThetas = VarThetas, sig2Post = sig2Post)
InvMargLik2 <- c(InvMargLik2, LogInvs)
setWinProgressBar(pb, s, title=paste( round(s/S*100, 0),"% done"))
}
close(pb); mean(InvMargLik2)
## NULL
## [1] 740.8927
# Restricted model
anRest <- N + a0; XRest <- X[,-5]
KRest <- dim(XRest)[2]; B0Rest <- cOpt*diag(KRest)
BnRest <- solve(solve(B0Rest)+t(XRest)%*%XRest)
bhatRest <- solve(t(XRest)%*%XRest)%*%t(XRest)%*%Y
b0Rest <- rep(0, KRest)
bnRest <- BnRest%*%(solve(B0Rest)%*%b0Rest+t(XRest)%*%XRest%*%bhatRest)
dnRest <- as.numeric(d0 + t(Y-XRest%*%bhatRest)%*%(Y-XRest%*%bhatRest)+t(bhatRest - b0Rest)%*%solve(solve(t(XRest)%*%XRest)+B0Rest)%*%(bhatRest - b0Rest))
HnRest <- as.matrix(Matrix::forceSymmetric(dnRest*BnRest/anRest))
sig2PostRest <- MCMCpack::rinvgamma(S,anRest/2,dnRest/2)
BetasRest <- LaplacesDemon::rmvt(S, bnRest, HnRest, anRest)
ThetasRest <- cbind(BetasRest, sig2PostRest)
MeanThetasRest <- colMeans(ThetasRest)
VarThetasRest <- var(ThetasRest)
iVarThetasRest <- solve(VarThetasRest)
ChiSQRest <- sapply(1:S, function(s){(ThetasRest[s,]-MeanThetasRest)%*%iVarThetasRest%*%(ThetasRest[s,]-MeanThetasRest)})
idGoodThetasRest <- which(ChiSQRest <= criticalval)
pb <- winProgressBar(title = "progress bar", min = 0, max = S, width = 300)
InvMargLik1 <- NULL
for(s in idGoodThetasRest){
LogInvs <- GDmarglik(ids = s, X = XRest, Betas = BetasRest, MeanThetas = MeanThetasRest, VarThetas = VarThetasRest, sig2Post = sig2PostRest)
InvMargLik1 <- c(InvMargLik1, LogInvs)
setWinProgressBar(pb, s, title=paste( round(s/S*100, 0),"% done"))
}
close(pb); summary(coda::mcmc(InvMargLik1))
## NULL
##
## Iterations = 1:9951
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 9951
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 7.518e+02 1.275e-01 1.278e-03 1.278e-03
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 751.6 751.7 751.8 751.9 752.0
## [1] 751.7977
## [1] 1.836687e-05
## [1] 54445.85
## [1] 21.80992
References
Jhonson and Rossell (2012) highlight the important distinction between pairwise consistency and model selection consistency. The latter requires the consistency of a sequence of pairwise nested comparisons.↩︎
Note that a sufficient condition for this assumption is to assume the same prior for the parameters that are the same in each model. Verdinelli and Wasserman (1995) incorporate a correction factor when this assumption is not satisfied.↩︎