4.3 Binary outcomes

In some cases, researchers will have to work with binary outcome data (e.g., dead/alive, depressive disorder/no depressive disorder) instead of continuous outcome data. In such a case, you will probably be more interested in outcomes like the pooled Odds Ratio, the Relative Risk (which the Cochrane Handbook advises to use instead of Odds Ratios, because they are easier to interpret), or the Incidence Rate Ratio. There are two common types of binary outcome data:

  • Event rate data. This is data in which we are only dealing with the number of persons experiencing an event in each group, and the total sample size in each group. Effect sizes we can calculate from such data are the Odds Ratio, the Relative Risk or the Risk Difference, among others.
  • Incidence rate data. Event rate data usually does not contain any information on the time span during which events did or did not occur. Given that studies often have drastically different follow-up times (e.g., 8 weeks vs. 2 years), it often makes sense to also take the time interval during which events occured into account. In epidemiology, incidence rates are often used to signify how many events occured within a standardized timeframe (e.g., one year). The corresponding effect size is the incidence rate ratio (IRR), which compares the incidence rate in the intervention group to the one in the control group.

For both event rate data and incidence rate data, there are again two options to pool effect sizes using the meta package:

  • The effect sizes are already calculated. In this case, we can use the metagen function as we did before (see Chapter 4.1 and Chapter 4.2), with a few other specifications. We describe how to use the metagen function for pre-calculated binary outcome data in Chapter 4.3.3.
  • We only have the raw outcome data. If this is the case, we will have to use the meta::metabin() or meta::metainc() function instead. We’ll show you how to do this below.

The Mantel-Haenszel method

In Chapter 4.1, we described the inverse-variance method through which effect sizes are pooled in meta-analyses based on continuous outcome data. For binary outcome data, it is common to use a slightly different approach, the Mantel-Haenszel method (Mantel and Haenszel 1959; Robins, Greenland, and Breslow 1986), to pool results. The formula for this method looks like this:

\[\begin{equation} \hat\theta_{MH} = \frac{\sum\limits_{k=1}^K w_{k} \hat\theta_k}{\sum\limits_{k=1}^K w_{k}} \end{equation}\]

This calculation comes very close the the inverse-variance method, but the calculation of the study weights \(w_k\) is different. The formulae for the weights for different effect sizes, with \(a_k\) being the number of events in the intervention group, \(c_k\) the number of event in the control group, \(b_k\) the number of non-events in the intervention group, \(d_k\) the number of non-events in the control group, and \(n_k\) being the total sample size, look like this:

Odds Ratio

\[w_k = \frac{b_kc_k}{n_k}\]

Risk Ratio \[w_k = \frac{(a_k+b_k)c_k}{n_k}\]

Risk Difference \[w_k = \frac{(a_k+b_k)(c_k+d_k)}{n_k}\]

For binary outcome data, it is generally recommended to use the Mantel-Haenszel method, and it is the default method in RevMan (Schwarzer, Carpenter, and Rücker 2015). It is also the default used for binary outcome data in the meta package.

4.3.1 Event rate data

For meta-analyses of event rate data, we need our data in the following format:

Column Description
Author This signifies the column for the study label (i.e., the first author)
Ee Number of events in the experimental treatment arm
Ne Number of participants in the experimental treatment arm
Ec Number of events in the control arm
Nc Number of participants in the control arm
Subgroup This is the label for one of your subgroup codes. It is not that important how you name it, so you can give it a more informative name (e.g. population). In this column, each study should then be given a subgroup code, which should be exactly the same for each subgroup, including upper/lowercase letters. Of course, you can also include more than one subgroup column with different subgroup codings, but the column name has to be unique

I will use my dataset binarydata, which also has this format.

## tibble [11 × 5] (S3: tbl_df/tbl/data.frame)
##  $ Author: chr [1:11] "Alcorta-Fleischmann" "Craemer" "Eriksson" "Jones" ...
##  $ Ee    : num [1:11] 2 18 6 3 0 8 12 1 7 17 ...
##  $ Ne    : num [1:11] 279 1273 1858 297 300 ...
##  $ Ec    : num [1:11] 1 17 5 6 1 9 12 10 8 21 ...
##  $ Nc    : num [1:11] 70 1287 1852 314 295 ...

The other parameters are like the ones we used in the meta-analyses with continuous outcome data, with two exceptions:

  • sm: As we want to have a pooled effect for binary data, we have to choose another summary measure now. We can choose from “OR” (Odds Ratio), “RR” (Risk Ratio), or RD (Risk Difference), among other things.
  • incr. This lets us define if and how we want a continuity correction to be performed. Such a correction is necessary in cases where one of the cells in your data is zero (e.g., because no one in the intervention arm died). This can be a frequent phenomenon in some contexts, and distorts our effect size estimates. By default, the metabin function adds the value 0.5 in all cells were \(n\) is zero (Gart and Zweifel 1967). This value can be changed using the incr-parameter (e.g., incr=0.1). If your trial arms are very uneven in terms of their total \(n\), we can also use the treatment arm continuity correction (J. Sweeting, J. Sutton, and C. Lambert 2004). This can be done by using incr="TACC".

Here is the code for a meta-analysis with raw binary data

I have decided to run a random-effect-model meta-analysis. I want the summary measure to be the Risk Ratio (RR).

m.bin <- metabin(Ee,
                 data = binarydata,
                 studlab = paste(Author),
                 comb.fixed = FALSE,
                 comb.random = TRUE,
                 method.tau = "SJ",
                 hakn = TRUE,
                 prediction = TRUE,
                 incr = 0.1,
                 sm = "RR")
##                         RR            95%-CI %W(random)
## Alcorta-Fleischmann 0.5018 [0.0462;  5.4551]        3.0
## Craemer             1.0705 [0.5542;  2.0676]       14.6
## Eriksson            1.1961 [0.3657;  3.9124]        8.5
## Jones               0.5286 [0.1334;  2.0945]        7.0
## Knauer              0.0894 [0.0001; 57.7924]        0.5
## Kracauer            0.9076 [0.3512;  2.3453]       10.9
## La Sala             0.9394 [0.4233;  2.0847]       12.7
## Maheux              0.0998 [0.0128;  0.7768]        3.9
## Schmidthauer        0.7241 [0.2674;  1.9609]       10.3
## van der Zee         0.8434 [0.4543;  1.5656]       15.1
## Wang                0.5519 [0.2641;  1.1534]       13.5
## Number of studies combined: k = 11
##                          RR           95%-CI     t p-value
## Random effects model 0.7420 [0.5202; 1.0582] -1.87  0.0905
## Prediction interval         [0.2305; 2.3882]              
## Quantifying heterogeneity:
## tau^2 = 0.2417; H = 1.00 [1.00; 1.36]; I^2 = 0.0% [0.0%; 45.8%]
## Test of heterogeneity:
##     Q d.f. p-value
##  7.34   10  0.6929
## Details on meta-analytical method:
## - Mantel-Haenszel method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Continuity correction of 0.1 in studies with zero cell frequencies

L’Abbé Plots

So-called L’Abbé plots (L’Abbé, Detsky, and O’Rourke 1987) are a good way to visualize data based on event rates. In a L’Abbé plot, the event rate of a study’s intervention group is plotted against the event rate in the control group, and the \(N\) of the study is signified by the size of the bubble in the plot. Despite its simplicity, this plot allows us to check for three important aspects of our meta-analysis with binary outcomes:

  • The overall trend of our meta-analysis. If we are expecting a type of intervention to have a protective effect (i.e., making an adverse outcome such as death or depression onset less likely) the studies should mostly lie in the bottom-right corner of the L’Abbé plot, because the control group event rate should be higher than the intervention group event rate. If there is no effect of the intervention compared to the control group, the event rates are identical and the study is shown on the diagonal of the L’Abbé plot.
  • Heterogeneity of effect sizes. The plot also allows us to eyeball for single studies or groups of studies which contribute to the heterogeneity of the effect we found. It could be the case, for example, that most studies lie in the bottom-right part of the plot as they report positive effects, while a few studies lie in the top-left sector indicated negative effects. Especially if such studies have a small precision (i.e., a small \(N\) of participants, indicated by small bubbles in the plot), they could have distorted our pooled effect and may contribute to the between-study heterogeneity.
  • Heterogeneity of event rates. It may also be the case that some of the heterogeneity in our meta-analysis was introduced by the fact that the event rates “per se” are higher or lower in some studies compared to the others. The L’Abbé plot provides as with this information, as studies with higher event rates will naturally tend towards the top-right corner of the plot.

The results of the metabin function can be easily used to generate L’Abbé plots using the labbe.metabin function included in the meta package. We can specify the following parameters:

Parameter Description
x This signifies our metabin meta-analysis output
bg The background color of the studies
col The line color of the studies
studlab Wether the names of the studies should be printed in the plot (TRUE/FALSE)
col.fixed The color of the dashed line symbolizing the pooled effect of the meta-analysis, if the fixed-effect-model was used
col.random The color of the dashed line symbolizing the pooled effect of the meta-analysis, if the random-effects model was used

For this example, I will use the m.bin output I previously generated using the metabin function.

labbe.metabin(x = m.bin,
              bg = "blue",
              studlab = TRUE,
              col.random = "red")

Works like a charm! We see that the dashed red line signifying the pooled effect estimate of my meta-analysis is running trough the bottom-right sector of my L’Abbé plot, meaning that the overall effect size is positive (i.e., that the intervention has a preventive effect).

However, it also becomes clear that all studies clearly follow this trend: we see that most studies lie tightly in the bottom-left corner of the plot, meaning that these studies had small event rates (i.e., the event in these studies was very rare irrespective of group assignment). We also see that two of our included studies do not fall into this pattern: Schmidthauer and van der Zee. Those two studies have higher event rates, and both favor the intervention group more clearly than the others did.

4.3.2 Incidence rates

To conduct meta-analyses using incidence rate data, so-called person-time data has to be collected or calculated by hand. What is basically needed to calculate person-time data is the number of events and the timeframe during which they occurred. You can find a general introduction into this topic here, and this quick course by the Centers for Disease Control and Prevention gives a hands-on introduction on how person-time data is calculated.

As an example, I will use my IRR.data dataset, in which the person-time (in my case, person-year) is stored in the time.e and time.c column for the experimental and control group, respectively. The event.e and event.c column contains the number of events (in my case, depression onsets) for both groups.

To pool the data, we use the metainc function included in the meta package. We can set the following parameters:

Parameter Function
event.e The number of events in the intervention group
time.e The person-time at risk in the intervention group
event.c The number of events in the control group
time.c The person-time at risk in the control group
data= After =, paste the name of your dataset here
studlab=paste() This tells the function were the labels for each study are stored. If you named the spreadsheet columns as advised, this should be studlab=paste(Author)
comb.fixed= Whether to use a fixed-effects-model
comb.random= Whether to use a random-effects-model. This has to be set to TRUE
method.tau= Which estimator to use for the between-study variance
hakn= Whether to use the Knapp-Hartung-Sidik-Jonkman method
prediction= Whether to print a prediction interval for the effect of future studies based on present evidence
sm= The summary measure we want to calculate. We want to calculate the Incidence Rate Ratio (IRR)
        studlab = paste(Author), 
        data = IRR.data, 
        sm = "IRR",
        method.tau = "DL",
        comb.random = TRUE,
        comb.fixed = FALSE,
        hakn = TRUE)
##                                              IRR           95%-CI
## Stice et al., 2013 (b)                    0.3688 [0.1443; 0.9424]
## Stice et al. 2017a (clinician-led)        0.8312 [0.2030; 3.4038]
## Stice et al. 2017a (peer-led)             0.3536 [0.0680; 1.8393]
## Stice et al. 2017a (online)               1.3888 [0.3687; 5.2314]
## Stice et al. 2017b (non dissonance-based) 0.3638 [0.0844; 1.5678]
## Stice et al. 2017b (dissonance-based)     0.3638 [0.0844; 1.5678]
## Taylor et al., 2006                       0.6184 [0.2604; 1.4686]
## Taylor et al., 2016                       0.7539 [0.4332; 1.3121]
##                                           %W(random)
## Stice et al., 2013 (b)                          14.0
## Stice et al. 2017a (clinician-led)               6.2
## Stice et al. 2017a (peer-led)                    4.5
## Stice et al. 2017a (online)                      7.0
## Stice et al. 2017b (non dissonance-based)        5.8
## Stice et al. 2017b (dissonance-based)            5.8
## Taylor et al., 2006                             16.5
## Taylor et al., 2016                             40.2
## Number of studies combined: k = 8
##                         IRR           95%-CI     t p-value
## Random effects model 0.6157 [0.4349; 0.8716] -3.30  0.0131
## Quantifying heterogeneity:
## tau^2 = 0; H = 1.00 [1.00; 1.44]; I^2 = 0.0% [0.0%; 51.8%]
## Test of heterogeneity:
##     Q d.f. p-value
##  4.71    7  0.6953
## Details on meta-analytical method:
## - Mantel-Haenszel method
## - DerSimonian-Laird estimator for tau^2
## - Hartung-Knapp adjustment for random effects model

4.3.3 Pre-calculated effect sizes

We can also synthesize pre-calculated effect sizes based on binary outcome data using the metagen function. However, there is an important caveat: binary outcome effect sizes are not on a natural scale. For effect sizes such as the Risk Ratio or Odds Ratio, our “zero” for no effect is not 0, but 1 instead, because this indicates that the event rates, or the like, are equal between groups. Furthermore, effect size differences are not directly comparable. A Risk Ratio of \(0.5\), for example, is not simply the “opposite” of a Risk Ratio of \(1.5\). While the first effect size means that the event rate has been halved, the latter does not mean that is has doubled.

To get effect sizes on a natural scale, it is common to log-transform the effect sizes first before they are pooled. While this is done automatically in the metabin and metainc function, we have to do this step ourselves once we use the metagen function. Let us have a look at my bin.metagen dataset:


This dataset contains the Risk Ratio as well as the lower and upper confidence interval of seven studies. To produce the input needed for the metagen function, we first log-transform all the effect size data.

bin.metagen$RR <- log(bin.metagen$RR)
bin.metagen$lower <- log(bin.metagen$lower)
bin.metagen$upper <- log(bin.metagen$upper)

We can calculate the Standard Error seTE like this:

bin.metagen$seTE <- (bin.metagen$upper - bin.metagen$lower)/3.92

We now have everything we need to use the metagen function for our effect size data. To reconvert effect sizes to their original scale, we specificy sm="RR" in the function:

        studlab = Author,
        method.tau = "SJ",
        sm = "RR",
        data = bin.metagen)
##                 RR           95%-CI %W(fixed) %W(random)
## Dorfman     0.7420 [0.5223; 1.0543]       2.4        9.3
## Chieung     0.6993 [0.4828; 1.0129]       2.1        8.6
## Haber       0.8270 [0.6487; 1.0544]       4.9       14.8
## Xu          0.8209 [0.5269; 1.2789]       1.5        6.6
## Fleischmann 0.8193 [0.5927; 1.1326]       2.8       10.4
## Knapp       1.1183 [0.9411; 1.3289]       9.7       20.2
## Hartung     0.9142 [0.8596; 0.9722]      76.6       30.0
## Number of studies combined: k = 7
##                          RR           95%-CI     z p-value
## Fixed effect model   0.9137 [0.8658; 0.9643] -3.28  0.0010
## Random effects model 0.8826 [0.7772; 1.0023] -1.92  0.0544
## Quantifying heterogeneity:
## tau^2 = 0.0131; H = 1.29 [1.00; 1.98]; I^2 = 39.6% [0.0%; 74.6%]
## Test of heterogeneity:
##     Q d.f. p-value
##  9.93    6  0.1277
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2

Please be aware that the metagen function can only use the generic inverse-variance method for pooling, and not the Mantel-Haenszel method. If possible, you should therefore always use the metabin (or metainc) function.


Mantel, Nathan, and William Haenszel. 1959. “Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease.” Journal of the National Cancer Institute 22 (4). Oxford University Press: 719–48.

Robins, James, Sander Greenland, and Norman E Breslow. 1986. “A General Estimator for the Variance of the Mantel-Haenszel Odds Ratio.” In American Journal of Epidemiology. Citeseer.

Schwarzer, Guido, James R Carpenter, and Gerta Rücker. 2015. Meta-Analysis with R. Springer.

Gart, John J, and James R Zweifel. 1967. “On the Bias of Various Estimators of the Logit and Its Variance with Application to Quantal Bioassay.” Biometrika. JSTOR, 181–87.

J. Sweeting, Michael, Alexander J. Sutton, and Paul C. Lambert. 2004. “What to Add to Nothing? Use and Avoidance of Continuity Corrections in Meta-Analysis of Sparse Data.” Statistics in Medicine 23 (9). Wiley Online Library: 1351–75.

L’Abbé, Kristan A, Allan S Detsky, and Keith O’Rourke. 1987. “Meta-Analysis in Clinical Research.” Annals of Internal Medicine 107 (2). Am Coll Physicians: 224–33.