Chapter 13 Bayesian Meta-Analysis
After delving into rather advanced extensions of Meta-Analysis, such as Network Meta-Analysis and Multilevel Meta-Analysis, let us now take one step back and look at “conventional” meta-analytical models again, but this time from another angle. In this chapter, we will deal with Bayesian Meta-Analysis. In its essence, Bayesian Meta-Analysis aims to do the same thing as the meta-analytic techniques we covered before: to pool effect sizes (usually of different studies) into one overall effect. The difference, however, is that Bayesian Meta-Analysis uses a different statistical approach to achieve this. Luckily, if you have worked through the previous chapters already, you already learned about the key components of Bayesian Meta-Analysis models, which we now only have to put together.
The idea behind Bayesian Meta-Analysis
The model we apply in Bayesian Meta-Analysis is a so-called Bayesian Hierarchical Model (Röver 2017; Higgins, Thompson, and Spiegelhalter 2009). In the chapter on Multilevel Meta-Analysis, we already covered that every meta-analytical model inherently possesses a multilevel, and thus “hierarchical”, structure. On the first level, we have the individual participants. The data on this level usually reaches us meta-analysts in the form of calculated effect sizes \(\hat\theta_k\) of each study \(k\). On the second level, we assume that these participants are nested within studies, and that the true effect sizes \(\theta_k\) of different studies in our meta-analysis follow their own distribution, with a mean \(\mu\) (the pooled “true” effect we want to estimate) and variance \(\tau^2\), the between-study heterogeneity.
Let us formalize this a little more: on the first level, we assume that the observed effect size \(\hat\theta_k\) we find reported in a study is an estimate of the “true” effect \(\theta_k\) in this study. The observed effect \(\hat\theta_k\) deviates from \(\theta_k\) due to the sampling error \(\epsilon_k\). This is because we assume that \(\hat\theta_k\) was drawn (sampled) from the (normal) distribution of all effects in study \(k\). The distribution of effects in \(k\) has a mean of \(\theta_k\), the “true” effect of the study, and a variance \(\sigma^2\). In the second step, we assume that the true effect sizes \(\theta_k\) themselves are only samples of an overarching distribution of true effect sizes. The mean of this distribution \(\mu\) is the pooled effect size we want to estimate; the effects \(\theta_k\) deviate from \(\mu\) because this overarching distribution also has a variance \(\tau^2\), the between-study heterogeneity. Taken together, the formula then looks like this:
\[ \hat\theta_k \sim \mathcal{N}(\theta_k,\sigma_k^2) \] \[ \theta_k \sim \mathcal{N}(\mu,\tau^2) \]
Here, we use \(\mathcal{N}\) to indicate that distributions from which effects were sampled follow a normal distribution. Some may argue that this is a unnecessarily strict assumption for the second equation (Higgins, Thompson, and Spiegelhalter 2009), but the formulation as shown here is the one that can be found most of the time. As covered before, the fixed-effect model is simply a special case of this model in which we assume that \(\tau^2 = 0\), meaning that there is no between-study heterogeneity, and that all studies share one single true effect size (i.e., that for all studies \(k\), \(\theta_k = \mu\)). We can also simplify this formula a little by using the marginal form (Röver 2017):
\[ \hat\theta_k | \mu, \tau, \sigma_k \sim \mathcal{N}(\mu,\sigma_k^2 + \tau^2)\]
You may have already detected that these formulae look a lot like the ones we defined before when discussing multilevel models. Indeed, there is nothing particularly bayesian about this formulation; the same model is assumed in the previous meta-analytic techniques we have covered. The make it bayesian, we add the following term to our model (Williams, Rast, and Bürkner 2018):
\[(\mu, \tau^2) \sim p(.)\] \[ \tau^2 > 0 \]
The first line is particularly important, as it defines prior distributions for the parameters \(\mu\) and \(\tau^2\). This allows us to specify a priori how we assume that the true pooled effect size \(\mu\) and between-study heterogeneity \(\tau^2\) will look like, and how certain we are about this. The second part constrains the between-study heterogeneity to be larger than zero. The formulation of the first equation does not specify which exact prior distribution we could or should assume for \(\mu\) and \(\tau^2\); we will cover reasonable, specific priors for our meta-analysis models in more detail later.
In the chapter on network meta-analysis, we already covered the methodology through which bayesian models estimate the true pooled effect we want to calculate. To recap, this involves Markov Chain Monte Carlo based sampling procedures, such as the Gibbs Sampler. In the brms
package we will be using in this chapter, the No-U-Turn Sampler, or NUTS (Hoffman and Gelman 2014), is used.
In the previous chapters, we introduced the meta
and metafor
packages, which use non-bayesian, or frequentist statistics to calculate meta-analyses. You may therefore be wondering why one should use bayesian methods, given that we already possess such powerful implementations using “conventional” approaches. However, Bayesian Meta-Analysis has some distinct advantages (Williams, Rast, and Bürkner 2018; McNeish 2016; Chung et al. 2013) :
- Bayesian methods allow to directly model the uncertainty when estimating \(\tau^2\), and have been found to be superior in estimating the between-study heterogeneity and pooled effect, particularly when the number of included studies is small (which is very often the case in real-world applications).
- Bayesian methods produce full posterior distributions for both \(\mu\) and \(\tau^2\), which allows to calculate actual probabilities that \(\mu\) or \(\tau^2\) is smaller or larger than some specific value. This is in contrast to frequentist methods, where only calculate confidence intervals. Yet, (95%) confidence intervals only state that if data sampling were repeated many, many times, the true value of a population parameter (such as \(\mu\) or \(\tau^2\)) will fall into the range of the confidence interval in 95% of the samples.
- Bayesian methods allow us to integrate prior knowledge and assumptions when calculating meta-analyses.
Setting the priors
Before, we have set up the Bayesian Hierarchical Model we can use to pool effects in a Bayesian Meta-Analysis. As part of the model, we have to specify the priors for \(\mu\) and \(\tau^2\) we want to use. Particularly when the number of studies is small, priors can have a considerable impact on the resulting estimates, so our selection should be reasonable. It has been argued that a good approach is to use weakly informative priors (Williams, Rast, and Bürkner 2018). Weaky informative priors can be contrasted with non-informative priors. Non-informative priors usually encompass uniform distributions, through which we represent that all values are equally likely to be true. Weakly informative priors, on the other hand, include distributions which represent that we do indeed have some confidence that some values are more credible than others, while still not making any overly specific statements about the exact true value of the parameter. This should intuitively make sense, and is also applicable to meta-analysis. In most applied cases, it seems reasonable to assume that the true effect size we want to estimate must lie somewhere between, for example, Cohen’s \(d=-2.0\) and \(d=2.0\), but will unlikely be hovering around \(d=50\). A good starting point for our \(\mu\) prior may therefore be a normal distribution with mean \(0\) and variance \(1\). This means that we grant a 95% prior probability that the true pooled effect size \(\mu\) lies between \(d=-2.0\) and \(d=2.0\):
\[ \mu \sim \mathcal{N}(0,1)\]
The next prior we have to specify is the one for \(\tau^2\). This is a little more difficult, since we know that \(\tau^2\) should in any case be non-negative, but can very likely be close to zero. A recommended distribution for this case, and one which is often used for variances such as \(\tau^2\), is the Half-Cauchy prior. The Half-Cauchy distribution is a special case of a Cauchy distribution which is only defined for one “half” (i.e., the positive side) of the distribution. The Half-Cauchy distribution is defined by two parameters, the location parameter \(x_0\), specifying the peak of the distribution on the x-axis, and \(s\), the scaling parameter, specifying how heavy-tailed (i.e., how much the distribution “spreads” out to higher values) the distribution is. It can be formalized as \(\mathcal{HC}(x_0,s)\).
The illustration below shows the Half-Cauchy distribution for varying values of \(s\), with the value of \(x_0\) fixed at 0.
The Half-Cauchy distribution has rather heavy tails, which makes it particularly useful as a prior distribution for \(\tau\). The heavy-tailedness ensures that we still give very high values of \(\tau\) some probability, while at the same time assuming that lower values are more likely. In most applied cases, we can assume that \(\tau\) (the square root of \(\tau^2\)) lies somewhere around \(0.3\), or is at least remotely of this magnitude. As a specification of the Half-Cauchy prior, we may therefore set \(s=0.3\), because this ensures that a value of less than \(\tau=0.3\) has a 50% probability (Williams, Rast, and Bürkner 2018). We can confirm this by using the Half-Cauchy distribution function as implemented in the phcauchy
function of the extraDistr
package.
library(extraDistr)
phcauchy(0.3, sigma = 0.3)
## [1] 0.5
However, this is already a rather specific assumption about the true value of \(\tau\). A more conservative approach, which we will follow here, is to set \(s\) to 0.5, which makes the distribution flatter. In general, it is advised to always conduct sensitivity analyses with different prior specifications to check if they affect the results substantially. Using \(s=0.5\) as our specification of the Half-Cauchy distribution, we can formalize our \(\tau\) prior like this:
\[ \tau \sim \mathcal{HC}(0,0.5)\]
We can now assemble the formulae of the hierarchical model and our prior specifications. This leads us to the complete model we want to use for our Bayesian Meta-Analysis:
\[ \hat\theta_k \sim \mathcal{N}(\theta_k,\sigma_k^2) \]
\[ \theta_k \sim \mathcal{N}(\mu,\tau^2) \] \[ \mu \sim \mathcal{N}(0,1)\] \[ \tau \sim \mathcal{HC}(0,0.5)\]
References
Röver, Christian. 2017. “Bayesian Random-Effects Meta-Analysis Using the Bayesmeta R Package.” arXiv Preprint arXiv:1711.08683.
Higgins, Julian PT, Simon G Thompson, and David J Spiegelhalter. 2009. “A Re-Evaluation of Random-Effects Meta-Analysis.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 172 (1). Wiley Online Library: 137–59.
Williams, Donald R, Philippe Rast, and Paul-Christian Bürkner. 2018. “Bayesian Meta-Analysis with Weakly Informative Prior Distributions.” PsyArXiv.
Hoffman, Matthew D, and Andrew Gelman. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15 (1): 1593–1623.
McNeish, Daniel. 2016. “On Using Bayesian Methods to Address Small Sample Problems.” Structural Equation Modeling: A Multidisciplinary Journal 23 (5). Taylor & Francis: 750–73.
Chung, Yeojin, Sophia Rabe-Hesketh, Vincent Dorie, Andrew Gelman, and Jingchen Liu. 2013. “A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models.” Psychometrika 78 (4). Springer: 685–709.