4 Pooling Effect Sizes
A long and winding road already lies behind us. Fortunately, we have now reached the core part of every metaanalysis: the pooling of effect sizes. We hope that you were able to resist the temptation of starting directly with this chapter. We have already discussed various topics in this book, including the definition of research questions, guidelines for searching, selecting, and extracting study data, as well as how to prepare our effect sizes.
Thorough preparation is a key ingredient of a good metaanalysis, and will be immensely helpful in the steps that are about to follow. We can assure you that the time you spent working through the previous chapters was well invested.
There are many packages which allow us to pool effect sizes in R. Here, we will focus on functions of the {meta} package, which we already installed in Chapter 2.2. This package is very userfriendly and provides us with nearly all important metaanalysis results using just a few lines of code. In the previous chapter, we covered that effect sizes come in different â€śflavorsâ€ť, depending on the outcome of interest. The {meta} package contains specialized metaanalysis functions for each of these effect size metrics. All of the functions also follow nearly the same structure.
Thus, once we have a basic understanding of how {meta} works, coding metaanalyses becomes straightforward, no matter which effect size we are focusing on. In this chapter, we will cover the general structure of the {meta} package. And of course, we will also explore the metaanalysis functions of the package in greater detail using handson examples.
The {meta} package allows us to tweak many details about the way effect sizes are pooled. As we previously mentioned, metaanalysis comes with many â€śresearcher degrees of freedomâ€ť. There are a myriad of choices concerning the statistical techniques and approaches we can apply, and if one method is better than the other often depends on the context.
Before we begin with our analyses in R, we therefore have to get a basic understanding of the statistical assumptions of metaanalyses, and the maths behind it. Importantly, we will also discuss the â€śideaâ€ť behind metaanalyses. In statistics, this â€śideaâ€ť translates to a model, and we will have a look at what the metaanalytic model looks like.
As we will see, the nature of the metaanalysis requires us to make a fundamental decision right away: we have to assume either a fixedeffect model or a randomeffects model. Knowledge of the concept behind metaanalytic pooling is needed to make an informed decision which of these two models, along with other analytic specifications, is more appropriate in which context.
4.1 The FixedEffect and RandomEffects Model
Before we specify the metaanalytic model, we should first clarify what a statistical model actually is. Statistics is full of â€śmodelsâ€ť, and it is likely that you have heard the term in this context before. There are â€ślinear modelsâ€ť, â€śgeneralized linear modelsâ€ť, â€śmixture modelsâ€ť, â€śgaussian additive modelsâ€ť, â€śstructural equation modelsâ€ť, and so on.
The ubiquity of models in statistics indicates how important this concept is. In one way or the other, models build the basis of virtually all parts of our statistical toolbox. There is a model behind \(t\)tests, ANOVAs, and regression. Every hypothesis test has its corresponding statistical model.
When defining a statistical model, we start with the information that is already given to us. This is, quite literally, our data^{17}. In metaanalyses, the data are effect sizes that were observed in the included studies. Our model is used to describe the process through which these observed data were generated.
The data are seen as the product of a black box, and our model aims to illuminate what is going on inside that black box.
Typically, a statistical model is like a special type of theory. Models try to explain the mechanisms that generated our observed data, especially when those mechanisms themselves cannot be directly observed. They are an imitation of life, using a mathematical formula to describe processes in the world around us in an idealized way.
This explanatory character of models is deeply ingrained in modern statistics, and metaanalysis is no exception. The conceptualization of models as a vehicle for explanation is the hallmark of a statistical â€ścultureâ€ť to which, as Breiman (2001) famously estimated, 98% of all statisticians adhere.
By specifying a statistical model, we try to find an approximate representation of the â€śrealityâ€ť behind our data. We want a mathematical formula that explains how we can find the true effect size underlying all of our studies, based on their observed results. As we learned in Chapter 1.1, one of the ultimate goals of metaanalysis is to find one numerical value that characterizes our studies as a whole, even though the observed effect sizes vary from study to study. A metaanalysis model must therefore explain the reasons why and how much observed study results differ, even though there is only one overall effect.
There are two models which try to answer exactly this question, the fixedeffect model and the randomeffects model. Although both are based on different assumptions, there is still a strong link between them, as we will soon see.
4.1.1 The FixedEffect Model
The fixedeffect model assumes that all effect sizes stem from a single, homogeneous population. It states that all studies share the same true effect size. This true effect is the overall effect size we want to calculate in our metaanalysis, denoted with \(\theta\).
According to the fixedeffect model, the only reason why a study \(k\)â€™s observed effect size \(\hat\theta_k\) deviates from \(\theta\) is because of its sampling error \(\epsilon_k\). The fixedeffect model tells us that the process generating studiesâ€™ different effect sizes, the content of the â€śblack boxâ€ť, is simple: all studies are estimators of the same true effect size. Yet, because every study can only draw somewhat bigger or smaller samples of the infinitely large study population, results are burdened by sampling error. This sampling error causes the observed effect to deviate from the overall, true effect.
We can describe the relationship like this (Borenstein et al. 2011, chap. 11):
\[\begin{equation} \hat\theta_k = \theta + \epsilon_k \tag{4.1} \end{equation}\]
To the alert reader, this formula may seem oddly similar to the one in Chapter 3.1. You are not mistaken. In the previous formula, we defined that an observed effect size \(\hat\theta_k\) of some study \(k\) is an estimator of that studyâ€™s true effect size \(\theta_k\), burdened by the studyâ€™s sampling error \(\epsilon_k\).
There is only a tiny, but insightful difference between the previous formula, and the one of the fixedeffect model. In the formula of the fixedeffect model, the true effect size is not symbolized by \(\theta_k\), but by \(\theta\); the subscript \(k\) is dropped.
Previously, we only made statements about the true effect size of one individual study \(k\). The fixedeffect model goes one step further. It tells us that if we find the true effect size of study \(k\), this effect size is not only true for \(k\) specifically, but for all studies in our metaanalysis. A studyâ€™s true effect size \(\theta_k\), and the overall, pooled effect size \(\theta\), are identical.
The formula of the fixedeffect models tells us that there is only one reason why observed effect sizes \(\theta_k\) deviate from the true overall effect: because of the sampling error \(\epsilon_k\). In Chapter 3.1, we already discussed that there is a link between the sampling error and the sample size of a study. All things being equal, as the sample size becomes larger, the sampling error becomes smaller. We also learned that the sampling error can be represented numerically by the standard error, which also grows smaller when the sample size increases.
Although we do not know the true overall effect size of our studies, we can exploit this relationship to arrive at the best possible estimate of the true overall effect, \(\hat\theta\). We know that a smaller standard error corresponds with a smaller sampling error; therefore, studies with a small standard error should be better estimators of the true overall effect than studies with a large standard error.
We can illustrate this with a simulation. Using the rnorm
function we already used before, we simulated a selection of studies in which the true overall effect is \(\theta = 0\). We took several samples but varied the sample size so that the standard error differs between the â€śobservedâ€ť effects. The results of the simulation can be found in Figure 4.1.
The results of the simulation show an interesting pattern. We see that effect sizes with a small sampling error are tightly packed around the true effect size \(\theta = 0\). As the standard error on the yaxis^{19} increases, the dispersion of effect sizes becomes larger and larger, and the observed effects deviate more and more from the true effect.
This behavior can be predicted by the formula of the fixedeffect model. We know that studies with a smaller standard error have a smaller sampling error, and their estimate of the overall effect size is therefore more likely to be closer to the truth.
We have seen that, while all observed effect sizes are estimators of the true effect, some are better than others. When we pool the effects in our metaanalysis, we should therefore give effect sizes with a higher precision (i.e.Â a smaller standard error) a greater weight. If we want to calculate the pooled effect size under the fixedeffect model, we therefore simply use a weighted average of all studies.
To calculate the weight \(w_k\) for each study \(k\), we can use the standard error, which we square to obtain the variance \(s^2_k\) of each effect size. Since a lower variance indicates higher precision, the inverse of the variance is used to determine the weight of each study.
\[\begin{equation} w_k = \frac{1}{s^2_k} \tag{4.2} \end{equation}\]
Once we know the weights, we can calculate the weighted average, our estimate of the true pooled effect \(\hat\theta\). We only have to multiply each studyâ€™s effect size \(\hat\theta_k\) with its corresponding weight \(w_k\), sum the results across all studies \(K\) in our metaanalysis, and then divide by the sum of all the individual weights.
\[\begin{equation} \hat\theta = \frac{\sum^{K}_{k=1} \hat\theta_kw_k}{\sum^{K}_{k=1} w_k} \tag{4.3} \end{equation}\]
This method is the most common approach to calculate average effects in metaanalyses. Because we use the inverse of the variance, it is often called inversevariance weighting or simply inversevariance metaanalysis.
For binary effect size data, there are alternative methods to calculate the weighted average, including the MantelHaenszel, Peto, or the sample size weighting method by Bakbergenuly (2020). We will discuss these methods in Chapter 4.2.3.1.
The {meta} package makes it very easy to perform a fixedeffect metaanalysis. Before, however, let us try out the inversevariance pooling â€śmanuallyâ€ť in R. In our example, we will use the SuicidePrevention
data set, which we already imported in Chapter 2.4.
The SuicidePrevention
data set contains raw effect size data, meaning that we have to calculate the effect sizes first. In this example, we calculate the smallsample adjusted standardized mean difference (Hedgesâ€™ \(g\)). To do this, we use the esc_mean_sd
function in the {esc} package (Chapter 3.3.1.2).
The function has an additional argument, es.type
, through which we can specify that the smallsample correction should be performed (by setting es.type = "g"
; Chapter 3.4.1).
Since the release of R version 4.2.1, we additionally have to plug our call to esc_mean_sd
into the pmap_dfr
function so that a standardized mean difference is calculated for each row in our data set:
# Load dmetar, esc and tidyverse (for pipe)
library(dmetar)
library(esc)
library(tidyverse)
# Load data set from dmetar
data(SuicidePrevention)
# Calculate Hedges' g and the Standard Error
#  We save the study names in "study".
#  We use the pmap_dfr function to calculate the effect size
# for each row.
SP_calc < pmap_dfr(SuicidePrevention,
function(mean.e, sd.e, n.e, mean.c,
sd.c, n.c, author, ...){
esc_mean_sd(grp1m = mean.e,
grp1sd = sd.e,
grp1n = n.e,
grp2m = mean.c,
grp2sd = sd.c,
grp2n = n.c,
study = author,
es.type = "g") %>%
as.data.frame()})
# Let us catch a glimpse of the data
# The data set contains Hedges' g ("es") and standard error ("se")
glimpse(SP_calc)
## Rows: 9
## Columns: 9
## $ study <chr> "Berry et al.", "DeVries et â€¦
## $ es <dbl> 0.14279447, 0.60770928, 0â€¦
## $ weight <dbl> 46.09784, 34.77314, 14.97625â€¦
## $ sample.size <dbl> 185, 146, 60, 129, 100, 220,â€¦
## $ se <dbl> 0.1472854, 0.1695813, 0.2584â€¦
## $ var <dbl> 0.02169299, 0.02875783, 0.06â€¦
## $ ci.lo <dbl> 0.4314686, 0.9400826, 0.6â€¦
## $ ci.hi <dbl> 0.145879624, 0.275335960, 0â€¦
## $ measure <chr> "g", "g", "g", "g", "g", "g"â€¦
Next, we use these results to apply the formula of the fixedeffect model:
# Calculate the inverse varianceweights for each study
SP_calc$w < 1/SP_calc$se^2
# Then, we use the weights to calculate the pooled effect
pooled_effect < sum(SP_calc$w*SP_calc$es)/sum(SP_calc$w)
pooled_effect
## [1] 0.2311121
The results of our calculations reveal that the pooled effect size, assuming a fixedeffect model, is \(g \approx\) 0.23.
4.1.2 The RandomEffects Model
As we have seen, the fixedeffect model is one way to conceptualize the genesis of our metaanalysis data, and how effects can be pooled. However, the important question is: does this approach adequately reflect reality?
The fixedeffect model assumes that all our studies are part of a homogeneous population and that the only cause for differences in observed effects is the sampling error of studies. If we were to calculate the effect size of each study without sampling error, all true effect sizes would be absolutely the same.
Subjecting this notion to a quick reality check, we see that the assumptions of the fixedeffect model might be too simplistic in many realworld applications. It is simply unrealistic that studies in a metaanalysis are always completely homogeneous. Studies will very often differ, even if only in subtle ways. The outcome of interest may have been measured in different ways. Maybe the type of treatment was not exactly the same or the intensity and length of the treatment. The target population of the studies may not have been exactly identical, or maybe there were differences in the control groups that were used.
It is likely that the studies in your metaanalysis will not only vary on one of these aspects but several ones at the same time. If this is true, we can anticipate considerable betweenstudy heterogeneity in the true effects.
All of this casts the validity of the fixedeffect model into doubt. If some studies used different types of a treatment, for example, it seems perfectly normal that one format is more effective than the other. It would be farfetched to assume that these differences are only noise, produced by the studiesâ€™ sampling error.
Quite the opposite, there may be countless reasons why real differences exist in the true effect sizes of studies. The randomeffects model addresses this concern. It provides us with a model that often reflects the reality behind our data much better.
In the randomeffects model, we want to account for the fact that effect sizes show more variance than when drawn from a single homogeneous population (L. V. Hedges and Vevea 1998). Therefore, we assume that effects of individual studies do not only deviate due to sampling error alone but that there is another source of variance.
This additional variance component is introduced by the fact that studies do not stem from one single population. Instead, each study is seen as an independent draw from a â€śuniverseâ€ť of populations.
Let us see how the randomeffects model can be expressed in a formula. Similar to the fixedeffect model, the randomeffects model starts by assuming that an observed effect size \(\hat\theta_k\) is an estimator of the studyâ€™s true effect size \(\theta_k\), burdened by sampling error \(\epsilon_k\):
\[\begin{equation} \hat\theta_k = \theta_k + \epsilon_k \tag{4.4} \end{equation}\]
The fact that we use \(\theta_k\) instead of \(\theta\) already points to an important difference. The randomeffects model only assumes that \(\theta_k\) is the true effect size of one single study \(k\). It stipulates that there is a second source of error, denoted by \(\zeta_k\). This second source of error is introduced by the fact that even the true effect size \(\theta_k\) of study \(k\) is only part of an overarching distribution of true effect sizes with mean \(\mu\).
\[\begin{equation} \theta_k = \mu + \zeta_k \tag{4.5} \end{equation}\]
The randomeffects model tells us that there is a hierarchy of two processes happening inside our black box (S. G. Thompson, Turner, and Warn 2001): the observed effect sizes of a study deviate from their true value because of the sampling error. But even the true effect sizes are only a draw from a universe of true effects, whose mean \(\mu\) we want to estimate as the pooled effect of our metaanalysis.
By plugging the second formula into the first one (i.e.Â replacing \(\theta_k\) with its definition in the second formula), we can express the randomeffects model in one line (Borenstein et al. 2011, chap. 12):
\[\begin{equation} \hat\theta_k = \mu + \zeta_k + \epsilon_k \tag{4.6} \end{equation}\]
This formula makes it clear that our observed effect size deviates from the pooled effect \(\mu\) because of two error terms, \(\zeta_k\) and \(\epsilon_k\). This relationship is visualized in Figure 4.2.
A crucial assumption of the randomeffects model is that the size of \(\zeta_k\) is independent of \(k\). Put differently, we assume that there is nothing which indicates a priori that \(\zeta_k\) in one study is higher than in another. We presuppose that the size of \(\zeta_k\) is a product of chance, and chance alone.
This is known as the exchangeability assumption of the randomeffects model (Julian Higgins, Thompson, and Spiegelhalter 2009; Lunn et al. 2012, chap. 10.1). All true effect sizes are assumed to be exchangeable in so far as we have nothing that could tell us how big \(\zeta_k\) will be in some study \(k\) before seeing the data.
Which Model Should I Use?
In practice, is it very uncommon to find a selection of studies that is perfectly homogeneous. This is true even when we follow best practices, and try to make the scope of our analysis as precise as possible through our PICO (Chapter 1.4.1).
In many fields, including medicine and the social sciences, it is therefore conventional to always use a randomeffects model, since some degree of betweenstudy heterogeneity can virtually always be anticipated. A fixedeffect model may only be used when we could not detect any betweenstudy heterogeneity (we will discuss how this is done in Chapter 5) and when we have very good reasons to assume that the true effect is fixed. This may be the case when, for example, only exact replications of a study are considered, or when we metaanalyze subsets of one big study. Needless to say, this is seldom the case, and applications of the fixedeffect model â€śin the wildâ€ť are rather rare.
Even though it is conventional to use the randomeffects model a priori, this approach is not undisputed. The randomeffects model pays more attention to small studies when calculating the overall effect of a metaanalysis (Schwarzer, Carpenter, and RĂĽcker 2015, chap. 2.3). Yet, small studies in particular are often fraught with biases (see Chapter 9.2.1). This is why some have argued that the fixedeffect model is (sometimes) preferable (Poole and Greenland 1999; Furukawa, McGuire, and Barbui 2003). Stanley, Doucouliagos, and Ioannidis (2022) make a similar point and argue that, in some disciplines, a socalled â€śunrestricted weighted least squaresâ€ť (UWLS) model should be used instead of the randomeffects model.
4.1.2.1 Estimators of the BetweenStudy Heterogeneity
The challenge associated with the randomeffects model is that we have to take the error \(\zeta_k\) into account. To do this, we have to estimate the variance of the distribution of true effect sizes. This variance is known as \(\tau^2\), or tausquared. Once we know the value of \(\tau^2\), we can include the betweenstudy heterogeneity when determining the inversevariance weight of each effect size.
In the randomeffects model, we therefore calculate an adjusted randomeffects weight \(w^*_k\) for each observation. The formula looks like this:
\[\begin{equation} w^*_k = \frac{1}{s^2_k+\tau^2} \tag{4.7} \end{equation}\]
Using the adjusted randomeffects weights, we then calculate the pooled effect size using the inverse variance method, just like we did using the fixedeffect model:
\[\begin{equation} \hat\theta = \frac{\sum^{K}_{k=1} \hat\theta_kw^*_k}{\sum^{K}_{k=1} w^*_k} \tag{4.8} \end{equation}\]
There are several methods to estimate \(\tau^2\), most of which are too complicated to do by hand. Luckily, however, these estimators are implemented in the functions of the {meta} package, which does the calculations automatically for us. Here is a list of the most common estimators, and the code by which they are referenced in {meta}:
 The DerSimonianLaird (
"DL"
) estimator (DerSimonian and Laird 1986).  The Restricted Maximum Likelihood (
"REML"
) or Maximum Likelihood ("ML"
) procedures (Viechtbauer 2005).  The PauleMandel (
"PM"
) procedure (Paule and Mandel 1982).  The Empirical Bayes (
"EB"
) procedure (Sidik and Jonkman 2019), which is practically identical to the PauleMandel method.  The SidikJonkman (
"SJ"
) estimator (Sidik and Jonkman 2005).
It is an ongoing research question which of these estimators performs best for different kinds of data. If one of the approaches is better than the other often depends on parameters such as the number of studies \(k\), the number of participants \(n\) in each study, how much \(n\) varies from study to study, and how big \(\tau^2\) is. Several studies have analyzed the bias of \(\tau^2\) estimators under these varying scenarios (Veroniki et al. 2016; Viechtbauer 2005; Sidik and Jonkman 2007; Langan et al. 2019).
Arguably, the most frequently used estimator is the one by DerSimonian and Laird. The estimator is implemented in software that has commonly been used by metaanalysts in the past, such as RevMan (a program developed by Cochrane) or Comprehensive MetaAnalysis. It also used to be the default estimator used in {meta}^{20}. Due to this historic legacy, one often finds research papers in which â€śusing a randomeffects modelâ€ť is used synonymous with employing the DerSimonianLaird estimator.
However, it has been found that this estimator can be biased, particularly when the number of studies is small and heterogeneity is high (Hartung 1999; Hartung and Knapp 2001a, 2001b; Follmann and Proschan 1999; Makambi 2004). This is quite problematic because it is very common to find metaanalyses with few studies and high heterogeneity.
In an overview paper, Veroniki and colleagues (2016) reviewed evidence on the robustness of various \(\tau^2\) estimators. They recommended the PauleMandel method for both binary and continuous effect size data, and the restricted maximum likelihood estimator for continuous outcomes. The restricted maximumlikelihood estimator is also the default method used by the {metafor} package.
A more recent simulation study by Langan and colleagues (2019) came to a similar result but found that the PauleMandel estimator may be suboptimal when the sample size of studies varies drastically. Another study by Bakbergenuly and colleagues (2020) found that the PauleMandel estimator is well suited especially when the number of studies is small. The SidikJonkman estimator, also known as the model error variance method, is only well suited when \(\tau^2\) is very large (Sidik and Jonkman 2007).
Which Estimator Should I Use?
There are no ironclad rules when exactly which estimator should be used. In many cases, there will only be minor differences in the results produced by various estimators, meaning that you should not worry about this issue too much.
When in doubt, you can always rerun your analyses using different \(\tau^2\) estimators, and see if this changes the interpretation of your results. Here are a few tentative guidelines that you may follow in your own metaanalysis:
For effect sizes based on continuous outcome data, the restricted maximum likelihood estimator may be used as a first start.
For binary effect size data, the PauleMandel estimator is a good first choice, provided there is no extreme variation in the sample sizes.
When you have very good reason to believe that the heterogeneity of effects in your sample is very large, and if avoiding false positives has a very high priority, you may use the SidikJonkman estimator.
If you want that others can replicate your results as precisely as possible outside R, the DerSimonianLaird estimator is the method of choice.
Overall, estimators of \(\tau^2\) fall into two categories. Some, like the DerSimonianLaird and SidikJonkman estimator, are based on closedform expressions, meaning that they can be directly calculated using a formula.
The (restricted) maximum likelihood, PauleMandel and empirical Bayes estimator find the optimal value of \(\tau^2\) through an iterative algorithm. Latter estimators may therefore sometimes take a little longer to calculate the results. In most realworld cases, however, these time differences are minuscule at best.
4.1.2.2 KnappHartung Adjustments
In addition to our selection of the \(\tau^2\) estimator, we also have to decide if we want to apply socalled KnappHartung adjustments^{21} (Knapp and Hartung 2003; Sidik and Jonkman 2002). These adjustments affect the way the standard error (and thus the confidence intervals) of our pooled effect size \(\hat\theta\) is calculated.
The KnappHartung adjustment tries to control for the uncertainty in our estimate of the betweenstudy heterogeneity. While significance tests of the pooled effect usually assume a normal distribution (socalled Waldtype tests), the KnappHartung method is based on a \(t\)distribution. KnappHartung adjustments can only be used in randomeffects models, and usually cause the confidence intervals of the pooled effect to become slightly larger.
Reporting the Type of Model Used In Your MetaAnalysis
It is highly advised to specify the type of model you used in the methods section of your metaanalysis report. Here is an example:
â€śAs we anticipated considerable betweenstudy heterogeneity, a randomeffects model was used to pool effect sizes. The restricted maximum likelihood estimator (Viechtbauer, 2005) was used to calculate the heterogeneity variance \(\tau^2\). We used KnappHartung adjustments (Knapp & Hartung, 2003) to calculate the confidence interval around the pooled effect.â€ť
Applying a KnappHartung adjustment is usually sensible. Several studies (IntHout, Ioannidis, and Borm 2014; Langan et al. 2019) showed that these adjustments can reduce the chance of false positives, especially when the number of studies is small.
The use of the KnappHartung adjustment, however, is not uncontroversial. Wiksten and colleagues (2016), for example, argued that the method can cause anticonservative results in (seldom) cases when the effects are very homogeneous.
4.2 Effect Size Pooling in R
Time to put what we learned into practice. In the rest of this chapter, we will explore how we can run metaanalyses of different effect sizes directly in R. The {meta} package we will use to do this has a special structure. It contains several metaanalysis functions which are each focused on one type of effect size data. There is a set of parameters which can be specified in the same way across all of these functions; for example if we want to apply a fixed or randomeffects model, or which \(\tau^2\) estimator should be used. Apart from that, there are functionspecific arguments which allow us to tweak details of our metaanalysis that are only relevant for a specific type of data.
Figure 4.3 provides an overview of {meta}â€™s structure. To determine which function to use, we first have to clarify what kind of effect size data we want to synthesize. The most fundamental distinction is the one between raw and precalculated effect size data. We speak of â€śrawâ€ť data when we have all the necessary information needed to calculate the desired effect size stored in our data frame but have not yet calculated the actual effect size. The SuicidePrevention
data set we used earlier contains raw data: the mean, standard deviation and sample size of two groups, which is needed to calculate the standardized mean difference.
We call effect size data â€śprecalculatedâ€ť, on the other hand, when they already contain the final effect size of each study, as well as the standard error. If we want to use a corrected version of an effect metric (such as Hedgesâ€™ \(g\), Chapter 3.4.1), it is necessary that this correction has already been applied to precalculated effect size data before we start the pooling.
If possible, it is preferable to use raw data in our metaanalysis. This makes it easier for others to understand how we calculated the effect sizes, and replicate the results. Yet, using raw data is often not possible in practice, because studies often report their results in a different way (Chapter 3.5.1).
This leaves us no other choice than to precalculate the desired effect size for each study right away so that all have the same format. Chapter 17 in the â€śHelpful Toolsâ€ť part of this book presents a few formulas which can help you to convert a reported effect size into the desired metric.
The function of choice for precalculated effect sizes is metagen
. Its name stands for generic inverse variance metaanalysis. If we use metagen
with binary data (e.g.Â proportions, risk ratios, odds ratios), it is important, as we covered in Chapter 3.3.2, that the effect sizes are logtransformed before the function is used.
When we can resort to raw effect size data, {meta} provides us with a specialized function for each effect size type. We can use the metamean
, metacont
and metacor
function for means, (standardized) mean differences and correlations, respectively. We can pool (incidence) rates, proportions and incidence rate ratios using the metarate
, metaprop
and metainc
functions. The metabin
function can be employed when we are dealing with risk or odds ratios.
All metaanalysis functions in {meta} follow the same structure. We have to provide the functions with the (raw or precalculated) effect size data, as well as further arguments, which control the specifics of the analysis. There are six core arguments which can be specified in each function:

studlab
. This argument associates each effect size with a study label. If we have the name or authors of our studies stored in our data set, we simply have to specify the name of the respective column (e.g.Âstudlab = author
). 
sm
. This argument controls the summary measure, the effect size metric we want to use in our metaanalysis. This option is particularly important for functions using raw effect size data. The {meta} package uses codes for different effect size formats, for example"SMD"
or"OR"
. The available summary measures are not the same in each function, and we will discuss the most common options in each case in the following sections. 
fixed
. We need to provide this argument with a logical (TRUE
orFALSE
), indicating if a fixedeffect model metaanalysis should be calculated^{22}. 
random
. In a similar fashion, this argument controls if a randomeffects model should be used. If bothcomb.fixed
andcomb.random
are set toTRUE
, both models are calculated and displayed^{23}. 
method.tau
. This argument defines the \(\tau^2\) estimator. All functions use the codes for different estimators that we already presented in the previous chapter (e.g.Â for the DerSimonianLaird method:method.tau = "DL"
). 
method.random.ci
. This argument controls how confidence intervals should be calculated when using the randomeffects model. When settingmethod.random.ci = "HK"
, the KnappHartung adjustment will be applied. In older versions of {meta}, this feature was controlled by setting thehakn
argument toTRUE
instead. Using the KnappHartung adjustment is advisable in most cases (see Chapter 4.1.2.2). 
data
. In this argument, we provide {meta} with the name of our metaanalysis data set. 
title
(not mandatory). This argument takes a character string with the name of the analysis. While it is not essential to provide input for this argument, it can help us to identify the analysis later on.
There are also a few additional arguments which we will get to know in later chapters. In this guide, we will not be able to discuss all arguments of the {meta} functions: there are more than 100.
Thankfully, most of these arguments are rarely needed or have sensible defaults. When in doubt, you can always run the name of the function, preceded by a question mark (e.g.Â ?metagen
) in the R console; this will open the function documentation.
Default Arguments & Position Matching
For R beginners, it is often helpful to learn about default arguments and positionbased matching in functions.
Default arguments are specified by the person who wrote the function. They set a function argument to a predefined value, which is automatically used unless we explicitly provide a different value. In {meta} many, but not all arguments have default values.
Default values are displayed in the â€śusageâ€ť section of the function documentation. If a function has defined a default value for an argument, it is not necessary to include it in our function call, unless we are not satisfied with the default behavior.
Arguments without default values always need to be
specified in our function call. The {meta} package has
a convenience function called gs
which we can use to check
the default value used for a specific argument. For example, try running
gs(â€śmethod.tauâ€ť)
. If there is no default value,
gs
will return NULL
.
Another interesting detail about R functions is position matching. Usually, we have to write down the name of an argument and its value in a function call. Through position matching, however, we can leave out the name of the argument, and only have to type in the argument value. We can do this if we specify the argument in the same position in which it appears in the documentation.
Take the sqrt
function. A written out call of this
function would be sqrt(x = 4)
. However, because we know
that x
, the number, is the first argument, we can simply
type in sqrt(4)
with the same result.
4.2.1 PreCalculated Effect Size Data
Let us begin our tour of metaanalysis functions with metagen
. As we learned, this function can be used for precalculated effect size data. In our first example, we will use the function to perform a metaanalysis of the ThirdWave
data set.
This data set contains studies examining the effect of socalled â€śthird waveâ€ť psychotherapies on perceived stress in college students. For each study, the standardized mean difference between a treatment and control group at posttest was calculated, and a small sample correction was applied. The effect size measure used in this metaanalysis, therefore, is Hedgesâ€™ \(g\). Let us have a look at the data.
library(tidyverse) # needed for 'glimpse'
library(dmetar)
library(meta)
data(ThirdWave)
glimpse(ThirdWave)
## Rows: 18
## Columns: 8
## $ Author <chr> "Call et al.", "Cavanagh et al.", "DanitzOrsillo"â€¦
## $ TE <dbl> 0.7091362, 0.3548641, 1.7911700, 0.1824552, 0.421â€¦
## $ seTE <dbl> 0.2608202, 0.1963624, 0.3455692, 0.1177874, 0.144â€¦
## $ RiskOfBias <chr> "high", "low", "high", "low", "low", "low", "highâ€¦
## $ TypeControlGroup <chr> "WLC", "WLC", "WLC", "no intervention", "informatâ€¦
## $ InterventionDuration <chr> "short", "short", "short", "short", "short", "shoâ€¦
## $ InterventionType <chr> "mindfulness", "mindfulness", "ACT", "mindfulnessâ€¦
## $ ModeOfDelivery <chr> "group", "online", "group", "group", "online", "gâ€¦
We see that the data set has eight columns, the most important of which are Author
, TE
and seTE
. The TE
column contains the \(g\) value of each study, and seTE
is the standard error of \(g\). The other columns represent variables describing the subgroup categories that each study falls into. These variables are not relevant for now.
We can now start to think about the type of metaanalysis we want to perform. Looking at the subgroup columns, we see that studies vary at least with respect to their risk of bias, control group, intervention duration, intervention type, and mode of delivery.
This makes it quite clear that some betweenstudy heterogeneity can be expected, and that it makes no sense to assume that all studies have a fixed true effect. We may therefore use the randomeffects model for pooling. Given its robust performance in continuous outcome data, we choose the restricted maximum likelihood ("REML"
) estimator in this example. We will also use the KnappHartung adjustments to reduce the risk of a false positive result.
Now that we have these fundamental questions settled, the specification of our call to metagen
becomes fairly straightforward. There are two functionspecific arguments which we always have to specify when using the function:
TE
. The name of the column in our data set which contains the calculated effect sizes.seTE
. The name of the column in which the standard error of the effect size is stored.
The rest are generic {meta} arguments that we already covered in the last chapter. Since the analysis deals with standardized mean differences, we also specify sm = "SMD"
. However, in this example, this has no actual effect on the results, since effect sizes are already calculated for each study. It will only tell the function to label effect sizes as SMDs in the output.
This gives us all the information we need to set up our first call to metagen
. We will store the results of the function in an object called m.gen
.
m.gen < metagen(TE = TE,
seTE = seTE,
studlab = Author,
data = ThirdWave,
sm = "SMD",
fixed = FALSE,
random = TRUE,
method.tau = "REML",
method.random.ci = "HK",
title = "Third Wave Psychotherapies")
Our m.gen
object now contains all the metaanalysis results. An easy way to get an overview is to use the summary
function^{24}.
summary(m.gen)
## Review: Third Wave Psychotherapies
## SMD 95%CI %W(random)
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.0
## Cavanagh et al. 0.3549 [0.0300; 0.7397] 6.3
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 3.8
## de Vibe et al. 0.1825 [0.0484; 0.4133] 7.9
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 7.3
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.3
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 5.7
## HazlettSteveâ€¦ 0.5287 [ 0.1162; 0.9412] 6.0
## Hintz et al. 0.2840 [0.0453; 0.6133] 6.9
## Kang et al. 1.2751 [ 0.6142; 1.9360] 3.9
## Kuhlmann et al. 0.1036 [0.2781; 0.4853] 6.3
## Lever Taylorâ€¦ 0.3884 [0.0639; 0.8407] 5.6
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.3
## Rasanen et al. 0.4262 [0.0794; 0.9317] 5.1
## Ratanasiripong 0.5154 [0.1731; 1.2039] 3.7
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 4.2
## Song & Lindquist 0.6126 [ 0.1683; 1.0569] 5.7
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.2
##
## Number of studies combined: k = 18
##
## SMD 95%CI t pvalue
## Random effects model 0.5771 [0.3782; 0.7760] 6.12 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.0820 [0.0295; 0.3533]; tau = 0.2863 [0.1717; 0.5944];
## I^2 = 62.6% [37.9%; 77.5%]; H = 1.64 [1.27; 2.11]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 45.50 17 0.0002
##
## Details on metaanalytical method:
##  Inverse variance method
##  Restricted maximumlikelihood estimator for tau^2
##  Qprofile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model
Here we go, the results of our first metaanalysis using R. There is a lot to unpack, so let us go through the output step by step.
The first part of the output contains the individual studies, along with their effect sizes and confidence intervals. Since the effects were precalculated, there is not much new to be seen here. The
%W(random)
column contains the weight (in percent) that the randomeffects model attributed to each study. We can see that, with 7.9%, the greatest weight in our metaanalysis has been given to the study by de Vibe. The smallest weight has been given to the study by Ratanasiripong. Looking at the confidence interval of this study, we can see why this is the case. The CIs around the pooled effect are extremely wide, meaning that the standard error is very high, and that the studyâ€™s effect size estimate is therefore not very precise.Furthermore, the output tells us the total number of studies in our metaanalysis. We see that \(K=\) 18 studies were combined.
The next section provides us with the core result: the pooled effect size. We see that the estimate is \(g \approx\) 0.58 and that the 95% confidence interval ranges from \(g \approx\) 0.38 to 0.78. We are also presented with the results of a test determining if the effect size is significant. This is the case (\(p<\) 0.001). Importantly, we also see the associated test statistic, which is denoted with
t
. This is because we applied the KnappHartung adjustment, which is based on a \(t\)distribution.Underneath, we see results concerning the betweenstudy heterogeneity. We will learn more about some of the results displayed here in later chapters, so let us only focus on \(\tau^2\). Next to
tau^2
, we see an estimate of the variance in true effects: \(\tau^2\) = 0.08. We see that the confidence interval oftau^2
does not include zero (0.03â€“0.35), meaning that \(\tau^2\) is significantly greater than zero. All of this indicates that betweenstudy heterogeneity exists in our data and that the randomeffects model was a good choice.The last section provides us with details about the metaanalysis. We see that effects were pooled using the inverse variance method, that the restricted maximumlikelihood estimator was used, and that the KnappHartung adjustment was applied.
We can also access information stored in m.gen
directly. Plenty of objects are stored by default in the metaanalysis results produced by {meta}, and a look into the â€śvalueâ€ť section of the documentation reveals what they mean. We can use the $
operator to print specific results of our analyses. The pooled effect, for example, is stored as TE.random
.
m.gen$TE.random
## [1] 0.5771158
Even when we specify fixed = FALSE
, {meta}â€™s functions always also calculate results for the fixedeffect model internally. Thus, we can also access the pooled effect assuming a fixedeffect model.
m.gen$TE.fixed
## [1] 0.4805045
We see that this estimate deviates considerably from the randomeffects model result.
When we want to adapt some details of our analyses, the update
function can be helpful. This function needs the {meta} object as input, and the argument we want to change. Let us say that we want to check if results differ substantially if we use the PauleMandel instead of the restricted maximum likelihood estimator. We can do that using this code:
m.gen_update < update(m.gen, method.tau = "PM")
# Get pooled effect
m.gen_update$TE.random
## [1] 0.5873544
# Get tau^2 estimate
m.gen_update$tau2
## [1] 0.1104957
We see that while the pooled effect does not differ much, the PauleMandel estimator gives us a somewhat larger approximation of \(\tau^2\).
Lastly, it is always helpful to save the results for later. Objects generated by {meta} can easily be saved as .rda (R data) files, using the save
function.
save(m.gen, file = "path/to/my/metaanalysis.rda") # example path
4.2.2 (Standardized) Mean Differences
Raw effect size data in the form of means and standard deviations of two groups can be pooled using metacont
. This function can be used for both standardized and unstandardized betweengroup mean differences. These can be obtained by either specifying sm = "SMD"
or sm = "MD"
. Otherwise, there are seven functionspecific arguments we have to provide:
n.e
. The number of observations in the treatment/experimental group.mean.e
. The mean in the treatment/experimental group.sd.e
. The standard deviation in the treatment/experimental group.n.c
. The number of observations in the control group.mean.c
. The mean in the control group.sd.c
. The standard deviation in the control group.method.smd
. This is only relevant whensm = "SMD"
. Themetacont
function allows us to calculate three different types of standardized mean differences. When we setmethod.smd = "Cohen"
, the uncorrected standardized mean difference (Cohenâ€™s \(d\)) is used as the effect size metric. The two other options are"Hedges"
(default and recommended), which calculates Hedgesâ€™ \(g\), and"Glass"
, which will calculate Glassâ€™ \(\Delta\) (delta). Glassâ€™ \(\Delta\) uses the control group standard deviation instead of the pooled standard deviation to standardize the mean difference. This effect size is sometimes used in primary studies when there is more than one treatment group, but usually not the preferred metric for metaanalyses.
For our example analysis, we will recycle the SuicidePrevention
data set we already worked with in Chapters 2.4 and 4.1.1. Not all studies in our sample are absolutely identical, so using a randomeffects model is warranted. We will also use KnappHartung adjustments again, as well as the restricted maximum likelihood estimator for \(\tau^2\). We tell metacont
to correct for smallsample bias, producing Hedgesâ€™ \(g\) as the effect size metric. Results are saved in an object that we name m.cont
.
Overall, our code looks like this:
# Make sure meta and dmetar are already loaded
library(meta)
library(dmetar)
library(meta)
# Load dataset from dmetar (or download and open manually)
data(SuicidePrevention)
# Use metcont to pool results.
m.cont < metacont(n.e = n.e,
mean.e = mean.e,
sd.e = sd.e,
n.c = n.c,
mean.c = mean.c,
sd.c = sd.c,
studlab = author,
data = SuicidePrevention,
sm = "SMD",
method.smd = "Hedges",
fixed = FALSE,
random = TRUE,
method.tau = "REML",
method.random.ci = "HK",
title = "Suicide Prevention")
Let us see what the results are:
summary(m.cont)
## Review: Suicide Prevention
##
## SMD 95%CI %W(random)
## Berry et al. 0.1428 [0.4315; 0.1459] 15.6
## DeVries et al. 0.6077 [0.9402; 0.2752] 12.3
## Fleming et al. 0.1112 [0.6177; 0.3953] 5.7
## Hunt & Burke 0.1270 [0.4725; 0.2185] 11.5
## McCarthy et al. 0.3925 [0.7884; 0.0034] 9.0
## Meijer et al. 0.2676 [0.5331; 0.0021] 17.9
## Rivera et al. 0.0124 [0.3454; 0.3703] 10.8
## Watkins et al. 0.2448 [0.6848; 0.1952] 7.4
## Zaytsev et al. 0.1265 [0.5062; 0.2533] 9.7
##
## Number of studies: k = 9
## Number of observations: o = 1147 (o.e = 571, o.c = 576)
##
## SMD 95%CI t pvalue
## Random effects model 0.2304 [0.3734; 0.0874] 3.71 0.0059
##
## Quantifying heterogeneity:
## tau^2 = 0.0044 [0.0000; 0.0924]; tau = 0.0661 [0.0000; 0.3040]
## I^2 = 7.4% [0.0%; 67.4%]; H = 1.04 [1.00; 1.75]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 8.64 8 0.3738
##
## Details on metaanalytical method:
##  Inverse variance method
##  Restricted maximumlikelihood estimator for tau^2
##  QProfile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model (df = 8)
##  Hedges' g (bias corrected standardised mean difference; using exact formulae)
Looking at the output and comparing it to the one we received in Chapter 4.2.1, we already see one of {meta}â€™s greatest assets. Although metagen
and metacont
are different functions requiring different data types, the structure of the output looks nearly identical. This makes interpreting the results quite easy. We see that the pooled effect according to the randomeffects model is \(g=\) 0.23, with the 95% confidence interval ranging from 0.09 to 0.37. The effect is significant (\(p=\) 0.006).
We see that the effect sizes have a negative sign. In the context of our metaanalysis, this represents a favorable outcome, because it means that suicidal ideation was lower in the treatment groups compared to the control groups. To make this clearer to others, we may also consistently reverse the sign of the effect sizes (e.g.Â write \(g=\) 0.23 instead), so that positive effect sizes always represent â€śpositiveâ€ť results.
The restricted maximum likelihood method estimated a betweenstudy heterogeneity variance of \(\tau^2\) = 0.004. Looking at tau^2
, we see that the confidence interval includes zero, meaning that the variance of true effect sizes is not significantly greater than zero.
In the details section, we are informed that Hedgesâ€™ \(g\) was used as the effect size metricâ€“just as we requested.
4.2.3 Binary Outcomes
4.2.3.1 Risk & Odds Ratios
The metabin
function can be used to pool effect sizes based on binary data, particularly risk and odds ratios. Before we start using the function, we first have to discuss a few particularities of metaanalyses based on these effect sizes.
It is possible to pool binary effect sizes using the generic inverse variance method we covered in Chapter 4.1.1 and 4.1.2.1. We need to calculate the logodds or risk ratio, as well as the standard error of each effect, and can then use the inverse of the effect size variance to determine the pooling weights.
However, this approach is suboptimal for binary outcome data (Julian Higgins et al. 2019, chap. 10.4.1). When we are dealing with sparse data, meaning that the number of events or the total sample size of a study is small, the calculated standard error may not be a good estimator of the precision of the binary effect size.
4.2.3.1.1 The MantelHaenszel Method
The MantelHaenszel method (Mantel and Haenszel 1959; Robins, Greenland, and Breslow 1986) is therefore sometimes used as an alternative to calculate the (fixedeffect) weights of studies with binary outcome data. It is also the default approach used in metabin
. This method uses the number of events and nonevents in the treatment and control group to determine a studyâ€™s weight. There are different formulas depending on if we want to calculate the risk or odds ratio.
Risk Ratio:
\[\begin{equation} w_k = \frac{(a_k+b_k) c_k}{n_k} \tag{4.9} \end{equation}\]
Odds Ratio:
\[\begin{equation} w_k = \frac{b_kc_k}{n_k} \tag{4.10} \end{equation}\]
In the formulas, we use the same notation as in Chapter 3.3.2.1, with \(a_k\) being the number of events in the treatment group, \(c_k\) the number of event in the control group, \(b_k\) the number of nonevents in the treatment group, \(d_k\) the number of nonevents in the control group, and \(n_k\) being the total sample size.
4.2.3.1.2 The Peto Method
A second approach is the Peto method (Yusuf et al. 1985). In its essence, this approach is based on the inverse variance principle we already know. However, it uses a special kind of effect size, the Peto odds ratio, which we will denote with \(\hat\psi_k\).
To calculate \(\hat\psi_k\), we need to know \(O_k\), the observed events in the treatment group, and calculate \(E_k\), the expected number of cases in the treatment group. The difference \(O_kE_k\) is then divided by the variance \(V_k\) of the difference between \(O_k\) and \(E_k\), resulting in a logtransformed version of \(\hat\psi_k\). Using the same cell notation as before, the formulas to calculate \(E_k\), \(O_k\) and \(V_k\) are the following:
\[\begin{equation} O_k = a_k \tag{4.11} \end{equation}\]
\[\begin{equation} E_k = \frac{(a_k+b_k)(a_k+c_k)}{a_k+b_k+c_k+d_k} \tag{4.12} \end{equation}\]
\[\begin{equation} V_k = \frac{(a_k+b_k)(c_k+d_k)(a_k+c_k)(b_k+d_k)}{{(a_k+b_k+c_k+d_k)}^2(a_k+b_k+c_k+d_k1)} \tag{4.13} \end{equation}\]
\[\begin{equation} \log\hat\psi_k = \frac{O_kE_k}{V_k} \tag{4.14} \end{equation}\]
The inverse of the variance of \(\log\hat\psi_k\) is then used as the weight when pooling the effect sizes^{25}.
4.2.3.1.3 The BakbergenulySample Size Method
Recently, Bakbergenuly and colleagues (2020) proposed another method in which the weight of effects is only determined by a studyâ€™s sample size, and showed that this approach may be preferable to the one by Mantel and Haenszel. We will call this the sample size method. The formula for this approach is fairly easy. We only need to know the sample size \(n_{\text{treat}_k}\) and \(n_{\text{control}_k}\) in the treatment and control group, respectively.
\[\begin{equation} w_k = \frac{n_{\text{treat}_k}n_{\text{control}_k}}{n_{\text{treat}_k} + n_{\text{control}_k} } \tag{4.15} \end{equation}\]
When we implement this pooling method in metabin
, the weights and overall effect of the fixedeffect and randomeffects model will be identical, since the sample sizes will be used as weights in both cases. Only the \(p\)value and confidence interval of the pooled effect will differ.
Which Pooling Method Should I Use?
In Chapter 3.3.2.1, we already talked extensively about the problem of zerocells and continuity correction. While both the Peto and sample size method can be used without modification when there are zero cells, it is common to add 0.5 to zero cells when using the MantelHaenszel method. This is also the default behavior in metabin
.
Using continuity corrections, however, have been discouraged (Efthimiou 2018), as they can lead to biased results. The MantelHaenszel method only really requires a continuity correction when one specific cell is zero in all included studies, which is rarely the case. Usually, it is therefore advisable to use the exact MantelHaenszel method without continuity corrections by setting MH.exact = TRUE
in metabin
.
The Peto method also has its limitations. First of all, it can only be used for odds ratios. Simulation studies also showed that the approach only works well when (1) the number of observations in the treatment and control group is similar, (2) when the observed event is rare (<1%), and (3) when the treatment effect is not overly large (Bradburn et al. 2007; Sweeting, Sutton, and Lambert 2004).
The Bakbergenulysample size method, lastly, is a fairly new approach, meaning that it is not as well studied as the other two methods.
All in all, it may be advisable in most cases to follow Cochraneâ€™s general assessment (Julian Higgins et al. 2019, chap. 10.4), and use the MantelHaenszel method (without continuity correction). The Peto method may be used when the odds ratio is the desired effect size metric, and when the event of interest is expected to be rare.
It is also important to keep in mind that these alternative methods are primarily relevant for the fixedeffect model. Pooled estimates using the randomeffects model will still essentially be based on the (generic) inversevariance method. For example, even if the MantelHaenszel method is specified in metabin
, the pooled effect is still derived using the same weighting approach we discussed in Chapter 4.1.2.1. For the Peto method, the inversevariance approach will be applied using the variance of the Peto logodds ratio.
4.2.3.1.4 Pooling Binary Effect Sizes in R
There are eight important functionspecific arguments in metabin
:
event.e
. The number of events in the treatment/experimental group.n.e
. The number of observations in the treatment/experimental group.event.c
. The number of events in the control group.n.c
. The number of observations in the control group.method
. The pooling method to be used. This can either be"Inverse"
(generic inversevariance pooling),"MH"
(MantelHaenszel; default and recommended for fixedeffect models),"Peto"
(Peto method), or"SSW"
(Bakbergenulysample size method; only whensm = "OR"
).sm
. The summary measure (i.e.Â effect size metric) to be calculated. We can use"RR"
for the risk ratio and"OR"
for the odds ratio.incr
. The increment to be added for continuity correction of zero cells. If we specifyincr = 0.5
, an increment of 0.5 is added. If we setincr = "TACC"
, the treatment arm continuity correction method is used (see Chapter 3.3.2.1). As mentioned before, it is usually recommended to leave out this argument and not apply continuity corrections.MH.exact
. Ifmethod = "MH"
, we can set this argument toTRUE
, indicating that we do not want that a continuity correction is used for the MantelHaenszel method.
For our handson example, we will use the DepressionMortality
data set. This data set is based on a metaanalysis by Cuijpers and Smit (2002), which examined the effect of suffering from depression on allcause mortality. The data set contains the number of individuals with and without depression, and how many individuals in both groups had died after several years.
Let us have a look at the data set first:
library(dmetar)
library(tidyverse)
library(meta)
data(DepressionMortality)
glimpse(DepressionMortality)
## Rows: 18
## Columns: 6
## $ author <chr> "Aaroma et al., 1994", "Black et al., 1998", "Bruce et al., 19â€¦
## $ event.e <dbl> 25, 65, 5, 26, 32, 1, 24, 15, 15, 173, 37, 41, 29, 61, 15, 21,â€¦
## $ n.e <dbl> 215, 588, 46, 67, 407, 44, 60, 61, 29, 1015, 105, 120, 258, 38â€¦
## $ event.c <dbl> 171, 120, 107, 1168, 269, 87, 200, 437, 227, 250, 66, 9, 24, 3â€¦
## $ n.c <dbl> 3088, 1901, 2479, 3493, 6256, 1520, 882, 2603, 853, 3375, 409,â€¦
## $ country <chr> "Finland", "USA", "USA", "USA", "Sweden", "USA", "Canada", "Neâ€¦
In this example, we will calculate the risk ratio as the effect size metric, as was done by Cuijpers and Smit. We will use a randomeffects pooling model, and, since we are dealing with binary outcome data, we will use the PauleMandel estimator for \(\tau^2\).
Looking at the data, we see that the sample sizes vary considerably from study to study, a scenario in which the PauleMandel method may be slightly biased (see Chapter 4.1.2.1). Keeping this in mind, we can also try out another \(\tau^2\) estimator as a sensitivity analysis to check if the results vary by a lot.
The data set contains no zero cells, so we do not have to worry about continuity correction, and can use the exact MantelHaenszel method right away. We save the metaanalysis results in an object called m.bin
.
m.bin < metabin(event.e = event.e,
n.e = n.e,
event.c = event.c,
n.c = n.c,
studlab = author,
data = DepressionMortality,
sm = "RR",
method = "MH",
MH.exact = TRUE,
fixed = TRUE,
random = TRUE,
method.tau = "PM",
method.random.ci = "HK",
title = "Depression and Mortality")
summary(m.bin)
## Review: Depression and Mortality
##
## RR 95%CI %W(common) %W(random)
## Aaroma et al., 1994 2.0998 [1.4128; 3.1208] 4.6 6.0
## Black et al., 1998 1.7512 [1.3139; 2.3341] 11.6 6.6
## Bruce et al., 1989 2.5183 [1.0785; 5.8802] 0.8 3.7
## Bruce et al., 1994 1.1605 [0.8560; 1.5733] 9.0 6.5
## Enzell et al., 1984 1.8285 [1.2853; 2.6014] 6.8 6.3
## Fredman et al., 1989 0.3971 [0.0566; 2.7861] 1.0 1.2
## Murphy et al., 1987 1.7640 [1.2644; 2.4610] 5.2 6.4
## Penninx et al., 1999 1.4647 [0.9361; 2.2919] 4.1 5.8
## Pulska et al., 1998 1.9436 [1.3441; 2.8107] 3.1 6.2
## Roberts et al., 1990 2.3010 [1.9206; 2.7567] 23.7 7.0
## Saz et al., 1999 2.1837 [1.5533; 3.0700] 5.5 6.3
## Sharma et al., 1998 2.0500 [1.0744; 3.9114] 2.5 4.7
## Takeida et al., 1997 6.9784 [4.1303; 11.7902] 1.5 5.3
## Takeida et al., 1999 5.8124 [3.8816; 8.7035] 3.3 6.0
## Thomas et al., 1992 1.3303 [0.7780; 2.2745] 4.0 5.3
## Thomas et al., 1992 1.7722 [1.1073; 2.8363] 4.1 5.6
## Weissman et al., 1986 1.2500 [0.6678; 2.3398] 2.7 4.8
## Zheng et al., 1997 1.9803 [1.4001; 2.8011] 6.4 6.3
##
## Number of studies: k = 18
## Number of observations: o = 94770 (o.e = 4514, o.c = 90256)
## Number of events: e = 5439
##
## RR 95%CI zt pvalue
## Common effect model 2.0634 [1.8909; 2.2516] 16.26 < 0.0001
## Random effects model 2.0217 [1.5786; 2.5892] 6.00 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.1865 [0.0739; 0.5568]; tau = 0.4319 [0.2718; 0.7462]
## I^2 = 77.2% [64.3%; 85.4%]; H = 2.09 [1.67; 2.62]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 74.49 17 < 0.0001
##
## Details on metaanalytical method:
##  MantelHaenszel method (common effect model)
##  Inverse variance method (random effects model)
##  PauleMandel estimator for tau^2
##  QProfile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model (df = 17)
We see that the pooled effect size, based on the randomeffects model, is RR \(=\) 2.02. The pooled effect is significant (\(p<\) 0.001), and indicates that suffering from depression doubles the mortality risk. We see that our estimate of the betweenstudy heterogeneity variance is \(\tau^2 \approx\) 0.19. The effect using the fixedeffect model is RR \(=\) 2.06, which is very similar to the randomeffects model.
The confidence interval of \(\tau^2\) does not include zero, indicating substantial heterogeneity between studies. Lastly, a look into the details section of the output reveals that the metabin
function used the MantelHaenszel method for pooling in the fixedeffect model, as intended.
As announced above, let us have a look if the method used to estimate \(\tau^2\) has an impact on the results. Using the update
function, we rerun the analysis, but use the restricted maximum likelihood estimator this time.
m.bin_update < update(m.bin, method.tau = "REML")
Now, let us have a look at the pooled effect again by inspecting TE.random
. We have to remember here that metaanalyses of binary outcomes are actually performed by using a logtransformed version of the effect size. When presenting the results, metabin
just reconverts the effect size metrics to their original form for our convenience. This step is not performed if we inspect elements in our metaanalysis object.
To retransform logtransformed effect sizes, we have to exponentiate the value. Exponentiation can be seen as the â€śantagonistâ€ť of logtransforming data, and can be performed in R using the exp
function^{26}. Let us put this into practice.
exp(m.bin_update$TE.random)
## [1] 2.02365
We see that the pooled effect using the restricted maximum likelihood estimator is virtually identical. Now, let us see the estimate of \(\tau^2\):
m.bin_update$tau2
## [1] 0.1647315
This value deviates somewhat, but not to a degree that should make us worry about the validity of our initial results.
Our call to metabin
would have looked exactly the same if we had decided to pool odds ratios. The only thing we need to change is the sm
argument, which has to be set to "OR"
. Instead of writing down the entire function call one more time, we can use the update
function again to calculate the pooled OR.
m.bin_or < update(m.bin, sm = "OR")
m.bin_or
## Review: Depression and Mortality
##
## [...]
##
## Number of studies combined: k = 18
##
## OR 95%CI zt pvalue
## Common effect model 2.3249 [2.0950; 2.5801] 15.88 < 0.0001
## Random effects model 2.2901 [1.7512; 2.9949] 6.52 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.2032 [0.0744; 0.6314]; tau = 0.4508 [0.2728; 0.7946]
## I^2 = 72.9% [56.7%; 83.0%]; H = 1.92 [1.52; 2.43]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 62.73 17 < 0.0001
##
## Details on metaanalytical method:
##  MantelHaenszel method (common effect model)
##  Inverse variance method (random effects model)
##  PauleMandel estimator for tau^2
##  QProfile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model (df = 17)
In the output, we see that the pooled effect using odds ratios is OR = 2.29.
4.2.3.1.5 Pooling PreCalculated Binary Effect Sizes
It is sometimes not possible to extract the raw effect size data needed to calculate risk or odds ratios in each study. For example, a primary study may report an odds ratio, but not the data on which this effect size is based on. If the authors do not provide us with the original data, this may require us to perform a metaanalysis based on precalculated effect size data. As we learned, the function we can use to do this is metagen
.
When dealing with binary outcome data, we should be careful if there is no other option than using precalculated effect size data. The metagen
function always uses the inversevariance method to pool effect sizes (even for fixedeffect models), and better options such as the MantelHaenszel approach can not be used. However, it is still a viable alternative if everything else fails.
Using the DepressionMortality
data set, let us simulate that we are dealing with a precalculated effect size metaanalysis. We can extract the TE
and seTE
object in m.bin
to get the effect size and standard error of each study. We save this information in our DepressionMortality
data set.
DepressionMortality$TE < m.bin$TE
DepressionMortality$seTE < m.bin$seTE
Now, imagine that there is one effect for which we know the lower and upper bound of the confidence interval, but not the standard error. To simulate such a scenario, we will (1) define the standard error of study 7 (Murphy et al., 1987) as missing (i.e.Â set its value to NA
), (2) define two new empty columns, lower
and upper
, in our data set, and (3) fill lower
and upper
with the logtransformed â€śreportedâ€ť confidence interval in study 7.
# Set seTE of study 7 to NA
DepressionMortality$seTE[7] < NA
# Create empty columns 'lower' and 'upper'
DepressionMortality[,"lower"] < NA
DepressionMortality[,"upper"] < NA
# Fill in values for 'lower' and 'upper' in study 7
# As always, binary effect sizes need to be logtransformed
DepressionMortality$lower[7] < log(1.26)
DepressionMortality$upper[7] < log(2.46)
Now let us have a look at the data we just created.
DepressionMortality[,c("author", "TE", "seTE", "lower", "upper")]
## author TE seTE lower upper
## 1 Aaroma et al., 1994 0.7418 0.20217 NA NA
## 2 Black et al., 1998 0.5603 0.14659 NA NA
## 3 Bruce et al., 1989 0.9235 0.43266 NA NA
## 4 Bruce et al., 1994 0.1488 0.15526 NA NA
## 5 Enzell et al., 1984 0.6035 0.17986 NA NA
## 6 Fredman et al., 1989 0.9236 0.99403 NA NA
## 7 Murphy et al., 1987 0.5675 NA 0.2311 0.9001
## 8 Penninx et al., 1999 0.3816 0.22842 NA NA
## [...]
It is not uncommon to find data sets like this one in practice. It may be possible to calculate the logrisk ratio for most studies, but for a few other ones, the only information we often have is the (logtransformed) risk ratio and its confidence interval.
Fortunately, metagen
allows us to pool even such data. We only have to provide the name of the columns containing the lower and upper bound of the confidence interval to the lower
and upper
argument. The metagen
function will then use this information to weight the effects when the standard error is not available. Our function call looks like this:
m.gen_bin < metagen(TE = TE,
seTE = seTE,
lower = lower,
upper = upper,
studlab = author,
data = DepressionMortality,
sm = "RR",
method.tau = "PM",
fixed = FALSE,
random = TRUE,
title = "Depression Mortality (Precalculated)")
summary(m.gen_bin)
## Review: Depression Mortality (Precalculated)
##
## [...]
##
## Number of studies combined: k = 18
##
## RR 95%CI z pvalue
## Random effects model 2.0218 [1.6066; 2.5442] 6.00 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.1865 [0.0739; 0.5568]; tau = 0.4319 [0.2718; 0.7462];
## I^2 = 77.2% [64.3%; 85.4%]; H = 2.09 [1.67; 2.62]
##
## [...]
In the output, we see that all \(K=\) 18 studies could be combined in the metaanalysis, meaning that metagen
used the information in lower
and upper
provided for study 7. The output also shows that the results using the inverse variance method are nearly identical to the ones we obtained before, using the raw outcome data.
4.2.3.2 Incidence Rate Ratios
Effect sizes based on incidence rates (i.e.Â incidence rate ratios, Chapter 3.3.3) can be pooled using the metainc
function. The arguments of this function are very similar to metabin
:
event.e
: The number of events in the treatment/experimental group.time.e
: The persontime at risk in the treatment/experimental group.event.c
: The number of events in the control group.time.c
: The persontime at risk in the control group.method
: Likemetabin
, the default pooling method is the one by Mantel and Haenszel ("MH"
), which is typically most relevant for the fixedeffects model. Alternatively, we can also use generic inverse variance pooling ("Inverse"
).sm
: The summary measure. We can choose between the incidence rate ratio ("IRR"
) and the incidence rate difference ("IRD"
).incr
: The increment we want to add for the continuity correction of zero cells.
In contrast to metabin
, metainc
does not use a continuity correction by default. Specifying MH.exact
as TRUE
is therefore not required. A continuity correction is only performed when we choose the generic inverse variance pooling method (method = "Inverse"
).
In our handson example, we will use the EatingDisorderPrevention
data set. This data is based on a metaanalysis which examined the effects of collegebased preventive interventions on the incidence of eating disorders (Harrer et al. 2020). The persontime at risk is expressed as personyears in this data set.
As always, let us first have a glimpse at the data:
library(dmetar)
library(tidyverse)
library(meta)
data(EatingDisorderPrevention)
glimpse(EatingDisorderPrevention)
## Rows: 5
## Columns: 5
## $ Author <chr> "Stice et al., 2013", "Stice et al., 2017a", "Stice et al., 20â€¦
## $ event.e <dbl> 6, 22, 6, 8, 22
## $ time.e <dbl> 362, 235, 394, 224, 160
## $ event.c <dbl> 16, 8, 9, 13, 29
## $ time.c <dbl> 356, 74, 215, 221, 159
We use metainc
to pool the effect size data, with the incidence rate ratio as the effect size metric. The MantelHaenszel method is used, and we employ the PauleMandel estimator to calculate the betweenstudy heterogeneity variance.
m.inc < metainc(event.e = event.e,
time.e = time.e,
event.c = event.c,
time.c = time.c,
studlab = Author,
data = EatingDisorderPrevention,
sm = "IRR",
method = "MH",
fixed = TRUE,
random = TRUE,
method.tau = "PM",
method.random.ci = "HK",
title = "Eating Disorder Prevention")
summary(m.inc)
## Review: Eating Disorder Prevention
##
## IRR 95%CI %W(common) %W(random)
## Stice et al., 2013 0.3688 [0.1443; 0.9424] 19.6 13.9
## Stice et al., 2017a 0.8660 [0.3855; 1.9450] 14.8 18.7
## Stice et al., 2017b 0.3638 [0.1295; 1.0221] 14.2 11.5
## Taylor et al., 2006 0.6071 [0.2516; 1.4648] 15.9 15.8
## Taylor et al., 2016 0.7539 [0.4332; 1.3121] 35.4 40.0
##
## Number of studies: k = 5
## Number of events: e = 139
##
## IRR 95%CI zt pvalue
## Common effect model 0.6161 [0.4359; 0.8709] 2.74 0.0061
## Random effects model 0.6223 [0.3955; 0.9791] 2.91 0.0439
##
## Quantifying heterogeneity:
## tau^2 = 0 [0.0000; 1.1300]; tau = 0 [0.0000; 1.0630]
## I^2 = 0.0% [0.0%; 79.2%]; H = 1.00 [1.00; 2.19]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 3.34 4 0.5033
##
## Details on metaanalytical method:
##  MantelHaenszel method (common effect model)
##  Inverse variance method (random effects model)
##  PauleMandel estimator for tau^2
##  QProfile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model (df = 4)
We see that the pooled effect using the randomeffects model is IRR = 0.62. This effect is significant (\(p=\) 0.04), albeit being somewhat closer to the conventional significance threshold than in the previous examples. Based on the pooled effect, we can say that the preventive interventions reduced the incidence of eating disorders within one year by 38%. Lastly, we see that the estimate of the heterogeneity variance \(\tau^2\) is zero.
4.2.4 Correlations
Correlations can be pooled using the metacor
function, which uses the generic inverse variance pooling method. In Chapter 3.2.3.1, we covered that correlations should be Fisherâ€™s \(z\)transformed before pooling. By default, metacor
does this transformation automatically for us. It is therefore sufficient to provide the function with the original, untransformed correlations reported in the studies. The metacor
function has only two relevant functionspecific arguments:

cor
. The (untransformed) correlation coefficient. 
n
. The number of observations in the study.
To illustrate metacor
â€™s functionality, we will use the HealthWellbeing
data set. This data set is based on a large metaanalysis examining the association between health and wellbeing (Ngamaba, Panagioti, and Armitage 2017).
Let us have a look at the data:
## Rows: 29
## Columns: 5
## $ author <chr> "An, 2008", "Angner, 2013", "Barger, 2009", "Doherty, 2013"â€¦
## $ cor <dbl> 0.620, 0.372, 0.290, 0.333, 0.730, 0.405, 0.292, 0.388, 0.3â€¦
## $ n <dbl> 121, 383, 350000, 1764, 42331, 112, 899, 870, 70, 67, 246, â€¦
## $ population <chr> "general population", "chronic condition", "general populatâ€¦
## $ country <chr> "South Korea", "USA", "USA", "Ireland", "Poland", "Australiâ€¦
We expect considerable betweenstudy heterogeneity in this metaanalysis, so a randomeffects model is employed. The restricted maximum likelihood estimator is used for \(\tau^2\).
m.cor < metacor(cor = cor,
n = n,
studlab = author,
data = HealthWellbeing,
fixed = FALSE,
random = TRUE,
method.tau = "REML",
method.random.ci = "HK",
title = "Health and Wellbeing")
summary(m.cor)
## Review: Health and Wellbeing
## COR 95%CI %W(random)
## An, 2008 0.6200 [0.4964; 0.7189] 2.8
## Angner, 2013 0.3720 [0.2823; 0.4552] 3.4
## Barger, 2009 0.2900 [0.2870; 0.2930] 3.8
## Doherty, 2013 0.3330 [0.2908; 0.3739] 3.7
## Dubrovina, 2012 0.7300 [0.7255; 0.7344] 3.8
## Fisher, 2010 0.4050 [0.2373; 0.5493] 2.8
## [...]
##
## Number of studies combined: k = 29
## Number of observations: o = 853794
##
## COR 95%CI t pvalue
## Random effects model 0.3632 [0.3092; 0.4148] 12.81 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.0241 [0.0141; 0.0436]; tau = 0.1554 [0.1186; 0.2088];
## I^2 = 99.8% [99.8%; 99.8%]; H = 24.14 [23.29; 25.03]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 16320.87 28 0
##
## Details on metaanalytical method:
##  Inverse variance method
##  Restricted maximumlikelihood estimator for tau^2
##  Qprofile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model
##  Fisher's z transformation of correlations
We see that the pooled association between health and wellbeing is \(r=\) 0.36, and that this effect is significant (\(p<\) 0.001). Using Cohenâ€™s convention, this can be considered a moderatesized correlation.
In the output, metacor
already reconverted the Fisherâ€™s \(z\)transformed correlations to the original form. A look at the last line of the details section, however, tells us that \(z\)values have indeed been used to pool the effects. Lastly, we see that the heterogeneity variance estimated for this metaanalysis is significantly larger than zero.
4.2.5 Means
A metaanalysis of means can be conducted using the metamean
function. This function uses the generic inverse variance method to pool the data. When using metamean
, we have to determine first if we want to perform a metaanalysis of raw or logtransformed means.
In contrast to odds and risk ratios, a logtransformation of means is usually not necessary. However, it is advisable to use the transformation when dealing with means of a nonnegative quantity (e.g.Â height), and when some means are close to zero. This is controlled via the sm
argument. If we set sm = "MRAW
, the raw means are pooled. The logtransformation is performed when sm = "MLN"
. The functionspecific arguments are:

n
: The number of observations. 
mean
: The mean. 
sd
: The standard deviation of the mean. 
sm
: The type of summary measure to be used for pooling (see above).
For our handson example, we will use the BdiScores
data set. This data set contains the mean score of the Beck Depression Inventory II (Beck, Steer, and Brown 1996), measured in samples of depression patients participating in psychotherapy and antidepressant trials (Furukawa et al. 2020).
The â€śBdiScoresâ€ť Data Set
The BdiScores
data set is included directly in the
{dmetar} package. If you have installed
{dmetar}, and loaded it from your library, running
data(BdiScores)
automatically saves the data set in your
R environment. The data set is then ready to be used. If you do
not have {dmetar} installed, you can download the data
set as an .rda file from the Internet,
save it in your working directory, and then click on it in your R Studio
window to import it.
library(dmetar)
library(tidyverse)
library(meta)
data(BdiScores)
# We only need the first four columns
glimpse(BdiScores[,1:4])
## Rows: 6
## Columns: 4
## $ author <chr> "DeRubeis, 2005", "Dimidjian, 2006", "Dozois, 2009", "Lesperancâ€¦
## $ n <dbl> 180, 145, 48, 142, 301, 104
## $ mean <dbl> 32.6, 31.9, 28.6, 30.3, 31.9, 29.8
## $ sd <dbl> 9.4, 7.4, 9.9, 9.1, 9.2, 8.6
Our goal is to calculate the overall mean depression score based on this collection of studies. We will use a randomeffects model and the restricted maximumlikelihood estimator to pool the raw means in our data set. We save the results in an object called m.mean
.
m.mean < metamean(n = n,
mean = mean,
sd = sd,
studlab = author,
data = BdiScores,
sm = "MRAW",
fixed = FALSE,
random = TRUE,
method.tau = "REML",
method.random.ci = "HK",
title = "BDIII Scores")
summary(m.mean)
## Review: BDIII Scores
##
## mean 95%CI %W(random)
## DeRubeis, 2005 32.6000 [31.2268; 33.9732] 18.0
## Dimidjian, 2006 31.9000 [30.6955; 33.1045] 19.4
## Dozois, 2009 28.6000 [25.7993; 31.4007] 9.1
## Lesperance, 2007 30.3000 [28.8033; 31.7967] 17.0
## McBride, 2007 31.9000 [30.8607; 32.9393] 20.7
## Quilty, 2014 29.8000 [28.1472; 31.4528] 15.8
##
## Number of studies: k = 6
## Number of observations: o = 920
##
## mean 95%CI
## Random effects model 31.1221 [29.6656; 32.5786]
##
## Quantifying heterogeneity:
## tau^2 = 1.0937 [0.0603; 12.9913]; tau = 1.0458 [0.2456; 3.6043]
## I^2 = 64.3% [13.8%; 85.2%]; H = 1.67 [1.08; 2.60]
##
## Test of heterogeneity:
## Q d.f. pvalue
## 14.00 5 0.0156
##
## Details on metaanalytical method:
##  Inverse variance method
##  Restricted maximumlikelihood estimator for tau^2
##  QProfile method for confidence interval of tau^2 and tau
##  HartungKnapp adjustment for random effects model (df = 5)
##  Untransformed (raw) means
The pooled mean assuming a randomeffects model is \(m\) = 31.12. We also see that the betweenstudy heterogeneity variance \(\tau^2\) in this metaanalysis is significantly greater than zero.
4.2.6 Proportions
The metaprop
function can be used to pool proportions. In Chapter 3.2.2, we already discussed that it is best to logittransform proportions before the metaanalysis is performed. The metaprop
function does this automatically for us if we specify sm = "PLOGIT"
. If the raw proportions should be pooled, we can use sm = "PRAW"
, but remember that this is discouraged.
The default method through which metaprop
pools proportions is somewhat special. If we use logittransformed values, the function does not use the inversevariance method for pooling, but builds a generalized linear mixedeffects model (GLMM). Essentially, the function fits a logistic regression model to our data, which includes randomeffects to account for the fact that true effect sizes vary between studies.
You may have heard of the term â€śmixedeffects modelâ€ť before. Such models are commonly used in primary studies across many research fields. In Chapters 7 and 8, we will delve into this topic a little deeper by discussing subgroup analysis and metaregression, which are special applications of mixedeffects models. For now, however, it is sufficient to understand the general idea of what a mixedeffects model is.
Mixedeffects models are regression models which contain both â€śfixedâ€ť and â€śrandomâ€ť components. The fixed elements are the \(\beta\) weights. A very simple regression model contains two \(\beta\) terms; the intercept \({\beta_0}\), as well as a regression term \({\beta_1}x\). These are used in combination to predict observed data \(y\) through some other quantity \(x\). This prediction will hardly ever be perfect, leaving some random error \(\epsilon_i\). Together, this gives the following formula:
\[\begin{equation} {y}_i = {\beta_0} + {\beta_1}x_i + \epsilon_i \tag{4.16} \end{equation}\]
The crucial point is that the value of the \(\beta\) weights in this equation remains the same for each observation \(i\). The value of \(x\) may vary from observation to observation, but \(\beta_0\) and \(\beta_1\) never do, since they are fixed.
This regression equation can be turned into a mixedeffects model when random effects are added. We denote this randomeffect term with \(u_i\). As indicated by the subscript \(i\), the random effect term can have different values for each observation. The \(u_i\) term is centered around zero and can increase or decrease the estimate produced by the fixed effects:
\[\begin{equation} {y}_i = {\beta_0} + {\beta_1}x_i + u_i + \epsilon_i \tag{4.17} \end{equation}\]
Metaanalysis can be seen as a special type of this model in which there is no \(\beta_1x_i\) term. The model only contains an intercept \(\beta_0\), which corresponds with the overall effect size \(\mu\) in the randomeffects model. The \(u_i\) and \(\epsilon_i\) parts correspond with the \(\zeta_k\) and \(\epsilon_k\) error terms in metaanalyses. This makes it clear that metaanalysis is equivalent to a mixedeffects regression model. This mixedeffects model, however, only contains an intercept, as well as a random effect connected to that intercept. Using a binomial logitlink^{27}, we can therefore apply a (generalized) logistic mixedeffect model to estimate the pooled effect.
GLMMs can be applied not only to proportions but also to other outcome measures based on binary and count data, such as odds ratios or incidence rate ratios (Stijnen, Hamza, and Ă–zdemir 2010). While GLMMs are not universally recommended for metaanalyses of binary outcome data (Bakbergenuly and Kulinskaya 2018), their use has been advocated for proportions (Schwarzer et al. 2019).
Using GLMMs as part of metaprop
has three implications: (1) the output will display no metaanalytic weights for each effect, (2) the \(\tau^2\) estimator can only be set to "ML"
(since maximumlikelihood is used to estimate the GLMM), and (3) there will be no confidence intervals for our estimate of \(\tau^2\). If this information is required, you may switch to performing an inversevariance metaanalysis. There are five functionspecific arguments for metaprop
:
event
. The number of events.n
. The number of observations.method
. The pooling method. Can be either a GLMM (method = "GLMM"
), or inversevariance pooling (method = "Inverse"
).incr
. The increment to be added for continuity correction in zero cells. This is only relevant when inversevariance pooling is used.sm
. The summary measure to be used. It is advised to use logittransformed proportions by settingsm = "PLOGIT"
(default).
For our illustration of the metaprop
function, we will use the OpioidMisuse
data set. This data is derived from a metaanalysis which examined the 12month prevalence of prescription opioid misuse among adolescents and young adults in the United States (Jordan et al. 2017).
Let us load the data set and have a look at it:
## Rows: 15
## Columns: 3
## $ author <chr> "Becker, 2008", "Boyd, 2009", "Boyd, 2007", "Cerda, 2014", "Fieâ€¦
## $ event <dbl> 2186, 91, 126, 543, 6496, 10850, 86, 668, 843, 647, 11521, 1111â€¦
## $ n <dbl> 21826, 912, 1084, 7646, 55215, 114783, 527, 9403, 11274, 8888, â€¦
We pool the prevalence data using a GLMM and logittransformed proportions.
m.prop < metaprop(event = event,
n = n,
studlab = author,
data = OpioidMisuse,
method = "GLMM",
sm = "PLOGIT",
fixed = FALSE,
random = TRUE,
method.random.ci = "HK",
title = "Opioid Misuse")
summary(m.prop)
## Review: Opioid Misuse
## proportion 95%CI
## Becker, 2008 0.1002 [0.0962; 0.1042]
## Boyd, 2009 0.0998 [0.0811; 0.1211]
## Boyd, 2007 0.1162 [0.0978; 0.1368]
## Cerda, 2014 0.0710 [0.0654; 0.0770]
## Fiellin, 2013 0.1176 [0.1150; 0.1204]
## [...]
##
##
## Number of studies combined: k = 15
## Number of observations: o = 434385
## Number of events: e = 41364
##
## proportion 95%CI
## Random effects model 0.0944 [0.0836; 0.1066]
##
## Quantifying heterogeneity:
## tau^2 = 0.0558; tau = 0.2362; I^2 = 98.3% [97.9%; 98.7%]; H = 7.74 [6.92; 8.66]
##
## Test of heterogeneity:
## Q d.f. pvalue Test
## 838.21 14 < 0.0001 Waldtype
## 826.87 14 < 0.0001 LikelihoodRatio
##
## Details on metaanalytical method:
##  Random intercept logistic regression model
##  Maximumlikelihood estimator for tau^2
##  HartungKnapp adjustment for random effects model
##  Logit transformation
##  ClopperPearson confidence interval for individual studies
In the output, we see that the pooled 12month prevalence of prescription opioid misuse in the selected studies is 9.4%, with the confidence interval ranging from 8.36 to 10.66%.
As described before, the output does not display the individual weight of each effect. In the same vein, we get an estimate of the betweenstudy heterogeneity (\(\tau^2 =\) 0.056), but no confidence interval around it.
\[\tag*{$\blacksquare$}\]
4.3 Questions & Answers
Test your knowledge!
 What is the difference between a fixedeffect model and a randomeffects model?
 Can you think of a case in which the results of the fixed and randomeffects model are identical?
 What is \(\tau^2\)? How can it be estimated?
 On which distribution is the KnappHartung adjustment based? What effect does it have?
 What does â€śinversevarianceâ€ť pooling mean? When is this method not the best solution?
 You want to metaanalyze binary outcome data. The number of observations in the study arms is roughly similar, the observed event is very rare, and you do no expect the treatment effect to be large. Which pooling method would you use?
 For which outcome measures can GLMMs be used?
Answers to these questions are listed in Appendix A at the end of this book.
4.4 Summary
In statistics, a model can be seen as a simplified â€śtheoryâ€ť, describing the process through which observed data were generated. There are two alternative models in metaanalysis: the fixedeffect model, and the randomeffects model.
While the fixedeffect model assumes that there is one true effect size, the randomeffects model states that the true effect sizes also vary within metaanalyses. The goal of the randomeffects model is therefore to find the mean of the true effect size distribution underlying our data.
The variance of true effect sizes \(\tau^2\), also known as betweenstudy heterogeneity variance, has to be estimated in randomeffects metaanalyses. There are several methods for this, and which one works best depends on the context.
The most common way to calculate a pooled effect size is through the inversevariance method. However, for binary outcome data, other approaches such as the MantelHaenszel method may be preferable.
In the {meta} package, there is a function to perform metaanalyses of precalculated effect size data, as well as a suite of functions that can be used for different types of â€śrawâ€ť outcome data.