11.1 Frequentist Network Meta-Analysis

In the following, we will describe how to perform a network meta-analysis using the netmeta package (Schwarzer, Carpenter, and Rücker 2015; Rücker et al. 2015). This package allows to estimate network meta-analysis models within a frequentist framework, with its statistical approach derived from graph theoretical methods developed for electrical networks (Rücker 2012).

The frequentist interpretation of probability

Frequentism is a common theoretical approach to interpret the probability of an event \(E\). This approach defines the probability of \(E\) in terms of how frequently \(E\) occurs if we repeat some process (e.g., an experiment) many, many times (Aronow and Miller 2019).

Frequentist ideas are at the core of many statistical procedures quantitative researchers use on a day to day basis, such as significance testing, calculating confidence intervals or interpreting p-values.

11.1.1 The Network Meta-Analysis Model

Let us now describe how the network meta-analysis model underlying the netmeta package is formulated. Let us assume that we have collected effect size data from \(K\) trials. We can then check all our \(K\) trials and then count the total number of treatment comparisons (e.g. some antidepressant vs. placebo) which are included in these trials. This number of pairwise comparisons is denoted by \(m\). We then calculate the effect size \(\hat\theta\) for each of the pairwise comparisons, and then put them together in a vector. You can think of a vector as an object containing a collection of numbers, like a numeric column in an R dataframe is basically a collection of numbers (see Chapter 3.3). In mathematical notation, vectors are usually written in bold. We therefore denote the vector containing all pairwise effect sizes \((\hat\theta_1, \hat\theta_2, ..., \hat\theta_m)\) as \(\mathbf{\hat\theta}\), and the vector containing the respective standard errors as \(\mathbf{s} = s_1, s_2, ..., s_m\). Our model formula then looks like this:

\[\mathbf{\hat\theta} = \mathbf{X}\mathbf{\theta}^{treatment} + \mathbf{\epsilon} \] Using this equation, we can basically imagine that our effect size vector \(\mathbf{\hat\theta}\) was “generated” by the right side of the formula, our model. The first part, \(\mathbf{X}\), is the \(m \times n\) design matrix, in which the columns represent the different treatments \(n\), and the rows the comparisons between treatments, which are signified by \(1\) and \(-1\). The parameter \(\mathbf{\epsilon}\) is a vector of sampling errors \(\epsilon_k\), which are assumed to be randomly drawn from a gaussian normal distribution with mean 0 and variance \(s_i^2\):

\[\epsilon_k \sim \mathcal{N}(0,s_i^2)\] The really important part of the formula, and what we actually want to estimate, is the vector \(\mathbf{\theta}^{treatment}\). This vector contains the “real” effects of the \(n\) treatments abstracted from the treatment comparisons we have in our network, and in the end allows us to compare the effects of different treatments directly.

To exemplify this (see Schwarzer, Carpenter, and Rücker 2015), let us assume we have \(K=5\) studies, each containing one distinct treatment comparison: \(A-B\), \(A-C\), \(A-D\), \(B-C\), and \(B-D\). We then have a vector of (observed) treatment comparisons \(\mathbf{\hat\theta} = (\hat\theta_{1,A,B}, \hat\theta_{2,A,C}, \hat\theta_{3,A,D}, \hat\theta_{4,B,C}, \hat\theta_{5,B,D})^\top\) and our vector containing the four (unknown) “true” effects for each treatment included in our network, \(\mathbf{\theta}^{treatment} = (\theta_A, \theta_B, \theta_C, \theta_D)^\top\). If we plug these parameters into our model formula, we get the following equation:

\[\mathbf{\hat\theta} = \mathbf{X}\mathbf{\theta}^{treatment} + \mathbf{\epsilon}\] \[\begin{align} \left( \begin{array}{c} \hat\theta_{1,A,B} \\ \hat\theta_{2,A,C} \\ \hat\theta_{3,A,D} \\ \hat\theta_{4,B,C} \\ \hat\theta_{5,B,D} \\ \end{array} \right) = \left( \begin{array}{cccc} 1 & -1 & 0 & 0 \\ 1 & 0 & -1 & 0 \\ 1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{array} \right) \left( \begin{array}{c} \theta_{A} \\ \theta_{B} \\ \theta_{C} \\ \theta_{D} \\ \end{array} \right) + \left( \begin{array}{c} \epsilon_{1} \\ \epsilon_{2} \\ \epsilon_{3} \\ \epsilon_{4} \\ \epsilon_{5} \\ \end{array} \right) \end{align}\]

The important part to mention here is that in its current form, the model formula is problematic from a mathematical point of view. Right now, the model is overparameterized, meaning that too many parameters in our model have to be estimated based on the data we have. This has to do with the design matrix \(\mathbf{X}\) not being of full rank. A matrix is not of full rank whenever the rows of the matrix are not all independent. Because we are dealing with a network of comparisons it is clear that not all comparisons (rows) will be independent from each other in our matrix; in our example, the row for comparison \(B-C\) is a linear combination of comparisons \(A-B\) and \(A-C\), and thus not independent. The fact that the \(\mathbf{X}\) matrix is not of full rank, however, means that it is not invertible, which makes it impossible to directly estimate \(\mathbf{\theta}^{treatment}\) using a weighted least squares approach. While we have at best \(n-1\) independent treatment comparisons, our model at the same time has to estimate the “true” effect for each of the \(n\) treatments in \(\mathbf{\theta}^{treatment}\).

This is where the graph theoretical approach of the netmeta package comes in with a solution. We will spare you the mathematical details of this approach, particularly given that the netmeta package will do the heavy lifting for us anyway. Let us only mention that this approach involves contructing a so-called Moore-Penrose pseudoinverse matrix which then allows for calculating the fitted values of our network model using a weighted least squares approach. The procedure also takes care of multiarm studies which contribute more than one pairwise comparison (i.e., studies in which more than two treatments were compared with each other). Since such comparisons are correlated when coming from the same study (see Chapter 13.9), the standard errors of multiarm study comparisons are increased artificially. A random-effects model can also be incorporated into this approach by adding the estimated heterogeneity \(\hat\tau^2\) to the variance of each comparison \(i\): \(s^2_i + \hat\tau^2\). In the netmeta package, \(\tau^2\) is estimated using an adaptation of the DerSimonian-Laird estimator (Jackson, White, and Riley 2013). Values of \(I^2\) are also calculated, which represent the amount of inconsistency in our network. The model allows to calculate an equivalent of Cochran’s \(Q\) (see Chapter 6), \(Q_{total}\), measuring the heterogeneity in the network. This can be used to calculate an \(I^2\) equivalent for our network model:

\[I^2 = max(\frac{Q_{total}-df}{Q_{total}}, 0)\]

Where the degrees of freedom in our network (\(df\)) are:

\[df = (\sum_{k=1}^Kp_k-1)-(n-1) \] with \(K\) being the number of studies, \(p\) the number of arms in each study \(k\), and \(n\) the total number of treatments in our entire network.

11.1.2 Performing a Network Meta-Analysis using the netmeta package

After all this statistical input, it is now time that we start working with the netmeta package. As always, we first install the package and then load it into our library.

library(netmeta) Data Preprocessing & Exploration

For our first network meta-analysis, we will use an original dataset reported by Senn et al. (2013). This dataset contains effect size data of randomized controlled trials comparing different medications for diabetes. The effect size of all comparisons denotes the mean difference (MD) in diabetic patients’ HbA1c value at post-test. This value represents the concentration of glucose in the blood, which diabetic medication aims to decrease. We will use this dataset because it is already preinstalled in the netmeta package, with all columns labeled correctly. To work with the dataset, we only need to load it into our global environment using the data() function. The dataset is then available as Senn2013, which we then rename to data for our convenience. Here is how you do this:

data <- Senn2013

Now, let us have a look at the data.

We see that the data has 28 rows, representing the treatment comparisons, and seven columns, some of which may already seem familiar to you if you have worked yourself through previous chapters.

  • The first column, TE, contains the effect size of each comparison, and seTE contains the respective standard error. In case you do not have precalculated effect size data for each comparison, you can first use the metacont (Chapter 4.1.2) or metabin function (Chapter 4.3), and then extract the calculated effect sizes from the meta-analysis object you created using the $TE and $seTE selector. Another, more flexible approach may be to use the effect size calculators we describe in Chapter 13.
  • treat1.long, treat2.long, treat1 and treat2 represent the two treatments being compared. Variables treat1 and treat2 simply contain a shortened name of the original treatment name, and are thus redundant.
  • The studlab column contains the unique study label, signifying in which study the specific treatment comparison was made. We can easily check if we have multiarm studies contributing more than one comparison by using the summary() function.
##           Alex1998          Baksi2004          Costa1997 
##                  1                  1                  1 
##       Davidson2007       DeFronzo1995         Derosa2004 
##                  1                  1                  1 
##         Garber2008 Gonzalez-Ortiz2004       Hanefeld2004 
##                  1                  1                  1 
##      Hermansen2007       Johnston1994      Johnston1998a 
##                  1                  1                  1 
##      Johnston1998b        Kerenyi2004            Kim2007 
##                  1                  1                  1 
##         Kipnes2001          Lewin2007         Moulin2006 
##                  1                  1                  1 
##          Oyama2008     Rosenstock2008         Stucci1996 
##                  1                  1                  1 
## Vongthavaravat2002         Willms1999  Wolffenbuttel1999 
##                  1                  3                  1 
##           Yang2003            Zhu2003 
##                  1                  1

We see that all studies only contribute one comparison, except for Willms1999, which contributes three. For all later steps, it is essential that you (1) include the studlab column in your dataset, (2) each individual study gets a unique label/name in the column, and (3) studies which contribute 2+ comparisons are named exactly the same across comparisons. Fitting the Model

We can now proceed by fitting our first network meta-analysis model using the netmeta function. The function has many, many parameters, and the most important ones are described below.

Code Description
TE The name of the column in our dataset containing the effect sizes for each comparison
seTE The name of the column in our dataset containing the standard error of the effect size for each comparison
treat1 The column in our dataset containing the name of the first treatment in a comparison
treat2 The column in our dataset containing the name of the second treatment in a comparison
studlab The column in our dataset containing the name of the study a comparison was extracted from. Although this argument is per se optional, we recommend to always specify it, because this is the only way to let the function know if multiarm trials are part of our network
data The dataset containing all our network data
sm The summary measure underlying our TE column. This can be specified as ‘RD’ (Risk Difference), ‘RR’ (Risk Ratio), ‘OR’ (Odds Ratio), ‘HR’ (hazard ratio), ‘MD’ (mean difference), ‘SMD’ (standardized mean difference), etc.
comb.fixed Whether a fixed-effects network meta-analysis should be conducted (TRUE/FALSE)
comb.random Whether a random-effects network meta-analysis should be conducted (TRUE/FALSE)
reference.group This lets us specify which treatment should be taken as a reference treatment (e.g. reference.group = ‘placebo’) for all other treatments
tol.multiarm The effect sizes for comparisons from multi-arm studies are, by design, consistent. Sometimes however, original papers may report slightly deviating results for each comparison, which may result in a violation of consistency. This argument lets us specify a tolerance threshold (a numeric value) for the inconsistency of effect sizes and their variances allowed in our network
details.chkmultiarm Wether we want to print out effect estimates of multiarm comparisons with inconsistent effect sizes.
sep.trts The character trough which compared treatments are seperated, e.g. ’ vs. ’

I will save the the result of the function the object m.netmeta. Let’s look at the results of our first model, for now assuming a fixed-effects model.

m.netmeta <- netmeta(TE = TE,
                     seTE = seTE,
                     treat1 = treat1,
                     treat2 = treat2,
                     studlab = paste(data$studlab),
                     data = data,
                     sm = "MD",
                     comb.fixed = TRUE,
                     comb.random = FALSE,
                     reference.group = "plac",
                     details.chkmultiarm = TRUE,
                     sep.trts = " vs ")
## Original data (with adjusted standard errors for multi-arm studies):
##                    treat1 treat2      TE   seTE seTE.adj narms multiarm
## DeFronzo1995         metf   plac -1.9000 0.1414   0.1414     2         
## Lewin2007            metf   plac -0.8200 0.0992   0.0992     2         
## Willms1999           acar   metf  0.2000 0.3579   0.3884     3        *
## Davidson2007         plac   rosi  1.3400 0.1435   0.1435     2         
## Wolffenbuttel1999    plac   rosi  1.1000 0.1141   0.1141     2         
## Kipnes2001           piog   plac -1.3000 0.1268   0.1268     2         
## Kerenyi2004          plac   rosi  0.7700 0.1078   0.1078     2         
## Hanefeld2004         metf   piog -0.1600 0.0849   0.0849     2         
## Derosa2004           piog   rosi  0.1000 0.1831   0.1831     2         
## Baksi2004            plac   rosi  1.3000 0.1014   0.1014     2         
## Rosenstock2008       plac   rosi  1.0900 0.2263   0.2263     2         
## Zhu2003              plac   rosi  1.5000 0.1624   0.1624     2         
## Yang2003             metf   rosi  0.1400 0.2239   0.2239     2         
## Vongthavaravat2002   rosi   sulf -1.2000 0.1436   0.1436     2         
## Oyama2008            acar   sulf -0.4000 0.1549   0.1549     2         
## Costa1997            acar   plac -0.8000 0.1432   0.1432     2         
## Hermansen2007        plac   sita  0.5700 0.1291   0.1291     2         
## Garber2008           plac   vild  0.7000 0.1273   0.1273     2         
## Alex1998             metf   sulf -0.3700 0.1184   0.1184     2         
## Johnston1994         migl   plac -0.7400 0.1839   0.1839     2         
## Johnston1998a        migl   plac -1.4100 0.2235   0.2235     2         
## Kim2007              metf   rosi -0.0000 0.2339   0.2339     2         
## Johnston1998b        migl   plac -0.6800 0.2828   0.2828     2         
## Gonzalez-Ortiz2004   metf   plac -0.4000 0.4356   0.4356     2         
## Stucci1996           benf   plac -0.2300 0.3467   0.3467     2         
## Moulin2006           benf   plac -1.0100 0.1366   0.1366     2         
## Willms1999           metf   plac -1.2000 0.3758   0.4125     3        *
## Willms1999           acar   plac -1.0000 0.4669   0.8242     3        *
## Number of treatment arms (by study):
##                    narms
## Alex1998               2
## Baksi2004              2
## Costa1997              2
## Davidson2007           2
## DeFronzo1995           2
## Derosa2004             2
## Garber2008             2
## Gonzalez-Ortiz2004     2
## Hanefeld2004           2
## Hermansen2007          2
## Johnston1994           2
## Johnston1998a          2
## Johnston1998b          2
## Kerenyi2004            2
## Kim2007                2
## Kipnes2001             2
## Lewin2007              2
## Moulin2006             2
## Oyama2008              2
## Rosenstock2008         2
## Stucci1996             2
## Vongthavaravat2002     2
## Willms1999             3
## Wolffenbuttel1999      2
## Yang2003               2
## Zhu2003                2
## Results (fixed effect model):
##                    treat1 treat2      MD             95%-CI     Q leverage
## DeFronzo1995         metf   plac -1.1141 [-1.2309; -0.9973] 30.89     0.18
## Lewin2007            metf   plac -1.1141 [-1.2309; -0.9973]  8.79     0.36
## Willms1999           acar   metf  0.2867 [ 0.0622;  0.5113]  0.05     0.09
## Davidson2007         plac   rosi  1.2018 [ 1.1084;  1.2953]  0.93     0.11
## Wolffenbuttel1999    plac   rosi  1.2018 [ 1.1084;  1.2953]  0.80     0.17
## Kipnes2001           piog   plac -1.0664 [-1.2151; -0.9178]  3.39     0.36
## Kerenyi2004          plac   rosi  1.2018 [ 1.1084;  1.2953] 16.05     0.20
## Hanefeld2004         metf   piog -0.0477 [-0.1845;  0.0891]  1.75     0.68
## Derosa2004           piog   rosi  0.1354 [-0.0249;  0.2957]  0.04     0.20
## Baksi2004            plac   rosi  1.2018 [ 1.1084;  1.2953]  0.94     0.22
## Rosenstock2008       plac   rosi  1.2018 [ 1.1084;  1.2953]  0.24     0.04
## Zhu2003              plac   rosi  1.2018 [ 1.1084;  1.2953]  3.37     0.09
## Yang2003             metf   rosi  0.0877 [-0.0449;  0.2203]  0.05     0.09
## Vongthavaravat2002   rosi   sulf -0.7623 [-0.9427; -0.5820]  9.29     0.41
## Oyama2008            acar   sulf -0.3879 [-0.6095; -0.1662]  0.01     0.53
## Costa1997            acar   plac -0.8274 [-1.0401; -0.6147]  0.04     0.57
## Hermansen2007        plac   sita  0.5700 [ 0.3170;  0.8230]  0.00     1.00
## Garber2008           plac   vild  0.7000 [ 0.4505;  0.9495]  0.00     1.00
## Alex1998             metf   sulf -0.6746 [-0.8482; -0.5011]  6.62     0.56
## Johnston1994         migl   plac -0.9439 [-1.1927; -0.6952]  1.23     0.48
## Johnston1998a        migl   plac -0.9439 [-1.1927; -0.6952]  4.35     0.32
## Kim2007              metf   rosi  0.0877 [-0.0449;  0.2203]  0.14     0.08
## Johnston1998b        migl   plac -0.9439 [-1.1927; -0.6952]  0.87     0.20
## Gonzalez-Ortiz2004   metf   plac -1.1141 [-1.2309; -0.9973]  2.69     0.02
## Stucci1996           benf   plac -0.9052 [-1.1543; -0.6561]  3.79     0.13
## Moulin2006           benf   plac -0.9052 [-1.1543; -0.6561]  0.59     0.87
## Willms1999           metf   plac -1.1141 [-1.2309; -0.9973]  0.04     0.02
## Willms1999           acar   plac -0.8274 [-1.0401; -0.6147]  0.04     0.02
## Number of studies: k = 26
## Number of treatments: n = 10
## Number of pairwise comparisons: m = 28
## Number of designs: d = 15
## Fixed effects model
## Treatment estimate (sm = 'MD', comparison: other treatments vs 'plac'):
##           MD             95%-CI
## acar -0.8274 [-1.0401; -0.6147]
## benf -0.9052 [-1.1543; -0.6561]
## metf -1.1141 [-1.2309; -0.9973]
## migl -0.9439 [-1.1927; -0.6952]
## piog -1.0664 [-1.2151; -0.9178]
## plac       .                  .
## rosi -1.2018 [-1.2953; -1.1084]
## sita -0.5700 [-0.8230; -0.3170]
## sulf -0.4395 [-0.6188; -0.2602]
## vild -0.7000 [-0.9495; -0.4505]
## Quantifying heterogeneity / inconsistency:
## tau^2 = 0.1087; I^2 = 81.4%
## Tests of heterogeneity (within designs) and inconsistency (between designs):
##                     Q d.f.  p-value
## Total           96.99   18 < 0.0001
## Within designs  74.46   11 < 0.0001
## Between designs 22.53    7   0.0021

There is plenty to see in this output, so let us go through it one by one.

  • The first thing we see are the calculated effect sizes for each comparison, with an asterisk signifying multiarm studies, for which the standard error had to be corrected.
  • Next, we see an overview of the number of treatment arms in each included study. Again, it is the study Willms2003 which stands out here because it contains three treatment arms, and thus multiple comparisons.
  • The next table shows us the fitted values for each comparison in our network meta-analysis model. The Q column in this table is usually very interesting, because it tells us which comparison may contribute substantially to the overall inconsistency in our network. For example, we see that the Q value of DeFronzo1995 is rather high, with \(Q=30.89\).
  • We then get to the core of our network model: the Treatment estimates. As specified, the effects of all treatments are displayed in comparison to the placebo condition, which is why there is no effect shown for plac.
  • We also see that the heterogeneity/inconsistency in our network model is very high, with \(I^2 = 81.4\%\). This means that a random-effects model may be warranted, and that we should rerun the function setting comb.random to TRUE.
  • The last part of the output (Tests of heterogeneity) breaks down the total heterogeneity in our network into heterogeneity attributable to within and between-design variation, respectively. The heterogeneity between treatment designs reflects the actual inconsistency in our network, and is highly significant (\(p=0.0021\)). The (“conventional”) within-designs heterogeneity is also highly significant. The information provided here is yet another sign that the random-effects model may be indicated for our network meta-analysis model. To directly check this, we can calculate the total inconsistency from the full design-by-treatment interaction random-effects model (Higgins et al. 2012). To do this, we only have to plug the m.netmeta object into the decomp.design function.
## Q statistics to assess homogeneity / consistency
##                     Q df  p-value
## Total           96.99 18 < 0.0001
## Within designs  74.46 11 < 0.0001
## Between designs 22.53  7   0.0021
## Design-specific decomposition of within-designs Q statistic
##        Design     Q df  p-value
##  metf vs rosi  0.19  1   0.6655
##  plac vs benf  4.38  1   0.0363
##  plac vs metf 42.16  2 < 0.0001
##  plac vs migl  6.45  2   0.0398
##  plac vs rosi 21.27  5   0.0007
## Between-designs Q statistic after detaching of single designs
##       Detached design     Q df p-value
##          acar vs sulf 22.52  6  0.0010
##          metf vs piog 17.13  6  0.0088
##          metf vs rosi 22.52  6  0.0010
##          metf vs sulf  7.51  6  0.2760
##          piog vs rosi 22.48  6  0.0010
##          plac vs acar 22.44  6  0.0010
##          plac vs metf 22.07  6  0.0012
##          plac vs piog 17.25  6  0.0084
##          plac vs rosi 16.29  6  0.0123
##          rosi vs sulf  6.77  6  0.3425
##  plac vs acar vs metf 22.38  5  0.0004
## Q statistic to assess consistency under the assumption of
## a full design-by-treatment interaction random effects model
##                    Q df p-value tau.within tau2.within
## Between designs 2.19  7  0.9483     0.3797      0.1442

In the output, we are first presented with the Q statistics to assess homogeneity / consistency again. The important part of the output is in the last section (Q statistic to assess consistency under the assumption of a full design-by-treatment interaction random effects model). This test shows us that, assuming a random-effects model, the (between-design) inconsistency is non-significant (\(Q = 2.19,p=0.9483\)), again suggesting that a random-effects model may be indicated. Further examination of the network model The Network Graph

Now that we have created our network meta-analysis model, we can proceed and plot our network graph. This can be done using te netgraph() function. This function has many parameters, which you can look up by typing ?netgraph() into your console. Most of those arguments, however, have very sensible default values, so we do not have to specify much here. As a first step, we feed the function with our netmeta object m.netmeta. We can also specify the order in which the treatment nodes appear using the seq argument.

         seq = c("plac", "migl", "benf", "acar", "metf", "rosi", "sulf", "piog", "sita", "vild"))

This network graph transports several kinds of information.

  • First, we see the overall structure of comparisons in our network, allowing us to understand which treatments were compared with each other in the original data.
  • Second, we can see that the edges have a different thickness, which corresponds to how often we find this specific comparison in our network. We see that Rosiglitazone has been compared to Placebo in many, many trials
  • We also see the one multiarm trial in our network, which is represented by the blue triangle in our network. This is the study Willms2003, which compared Meformin, Acarbose and Placebo.

The netgraph() function also allows to plot a 3D graph, which may be helpful to get a better grasp of you network structure. Although it may often not be possible to include such a graph in a scientific paper, it is still a very cool feature, so we did not want to withhold this from you. The function requires the rgl package to be installed and loaded from your library. To produce a 3D graph, we then only have to specify the dim argument as "3d".

netgraph(m.netmeta, dim = "3d") Visualising direct and indirect evidence

As a next step, we can also turn our attention to the direct and indirect evidence in our network by looking at the proportion of direct and indirect contributing to each comparison. We have prepared a function for this purpose for you, which is called direct.evidence.plot(). The direct.evidence.plot function is part of the dmetar package. If you have the package installed already, you have to load it into your library first.


If you don’t want to use the dmetar package, you can find the source code for this function here. In this case, R doesn’t know this function yet, so we have to let R learn it by copying and pasting the code in its entirety into the console in the bottom left pane of RStudio, and then hit Enter ⏎. The function then requires the ggplot2, gridExtra, and scales package to work.

The function provides you with two outputs: a plot showing the percentage of direct and indirect evidence used for each estimated comparison, and a corresponding data frame containing the underlying data. The only thing the direct.evidence.plot() function needs as input is an object created by the netmeta() function.

direct.evidence.plot(x = m.netmeta)

As we can see, there are many estimates in our network model which had to be inferred by indirect evidence alone. The plot also provides us with two additional metrics: the Minimal Parallelism and the Mean Path Length of each comparison. According to König, Krahn, and Binder (2013), lower values of minimal parallelism and \(Mean Path Length > 2\) means that results for a specific comparison should be interpreted with caution. Producing a matrix containing all treatment comparison estimates

Next, we can have a look at the estimates of our network for all possible treatment combinations. To do this, we can use the result matrices saved in our netmeta results object under $TE.fixed (if we use the fixed-effects model) or $TE.random (if we use the random-effects model). We will make a few preprocessing steps to make the matrix more readable. First, we extract the matrix from our m.netmeta object, and round the numbers in the matrix to three digits.

result.matrix <- m.netmeta$TE.fixed
result.matrix <- round(result.matrix, 3)

Given that one “triangle” in our matrix will hold redundant information, we replace the lower triangle with an empty value using this code:

result.matrix[lower.tri(result.matrix, diag = FALSE)] <- "."

We then get the following results:

Acarbose Benfluorex Metformin Miglitol Pioglitazone Placebo Rosiglitazone Sitagliptin Sulfonylurea Vildagliptin
Acarbose 0 0.078 0.287 0.117 0.239 -0.827 0.374 -0.257 -0.388 -0.127
Benfluorex . 0 0.209 0.039 0.161 -0.905 0.297 -0.335 -0.466 -0.205
Metformin . . 0 -0.17 -0.048 -1.114 0.088 -0.544 -0.675 -0.414
Miglitol . . . 0 0.123 -0.944 0.258 -0.374 -0.504 -0.244
Pioglitazone . . . . 0 -1.066 0.135 -0.496 -0.627 -0.366
Placebo . . . . . 0 1.202 0.57 0.439 0.7
Rosiglitazone . . . . . . 0 -0.632 -0.762 -0.502
Sitagliptin . . . . . . . 0 -0.131 0.13
Sulfonylurea . . . . . . . . 0 0.261
Vildagliptin . . . . . . . . . 0

If we want to report these results in our research paper, a good idea might be to also include the confidence intervals for each comparison. These can be obtained the same way as above using the $lower.fixed and $upper.fixed (or $lower.random and $upper.random) matrices for the lower and upper confidence interval.

An extremely convenient way to export all estimated effect sizes is provided by the netleague() function. This function creates a matrix similar to the one above. Yet, in the matrix produced by this function, the upper triangle will display only the pooled effect sizes of the direct comparisons available in our network, like one would attain them if we had performed a conventional meta-analysis for each comparison. Because we do not have direct evidence for all comparisons, some fields in the upper triangle will remain empty. The lower triangle then contains the network meta-analysis effect sizes for each comparison. This matrix can be easily exported into a .csv file, which we can then use to report our network meta-analysis results in a table (for example in Microsoft WORD). The big plus of this function is that effect size estimates and confidence intervals will be displayed together in each cell; we only have too tell the function how the brackets for the confidence intervals should look like, and how many digits we want our estimates to have behind the comma. If i want to save the fixed-effects model results in a .csv (EXCEL) document called “netleague.csv”, i can use the following code. As always, we also need to feed the netleague function with our netmeta object.

netleague <- netleague(m.netmeta, bracket = "(", digits=2)
write.csv(netleague$fixed, "netleague.csv") The treatment ranking

The most interesting question we may want to answer in a network meta-analysis is of course: which intervention works the best?. The netrank() function implemented in netmeta allows us to generate such a ranking of treatments from most to least beneficial. The netrank() function is again based on a frequentist treatment ranking method using P-scores. These P-scores measure the certainty that one treatment is better than another treatment, averaged over all competing treatments. The P-score has been shown to be equivalent to the SUCRA score (Rücker and Schwarzer 2015), which we will describe in the chapter on bayesian network meta-analysis. The function needs our netmeta object as input. Additionally, we should specify the small.values parameter, which defines if smaller effect sizes in a comparison indicate a beneficial ("good") or harmful ("bad") effect. Let us have a look at the output for our example:

netrank(m.netmeta, small.values = "good")
##               P-score
## Rosiglitazone  0.9789
## Metformin      0.8513
## Pioglitazone   0.7686
## Miglitol       0.6200
## Benfluorex     0.5727
## Acarbose       0.4792
## Vildagliptin   0.3512
## Sitagliptin    0.2386
## Sulfonylurea   0.1395
## Placebo        0.0000

We see that the Rosiglitazone treatment has the highest P-score, indicating that this treatment may be particularly helpful. Conversely, Placebo has a P-score of zero, which seems to go along with our intuition that a placebo will not likely be the best treatment decision. Nevertheless, it should be noted that one should never automatically conclude that one treatment is the best by solely because it has the highest score (Mbuagbaw et al. 2017). A good way to also visualize the uncertainty in our network is to produce network forest plots with the “weakest” treatment as comparison. This can be done using the forest function, wich works very similar to the one of the meta package described in Chapter 5. We can specify the reference group for the forest plot using the reference.group argument.

       reference.group = "placebo",
       sortvar = TE,
       col.square = "blue",
       smlab = "Medications vs. Placebo \n (HbA1c value)")

We now see that the results are not as clear as they seemed before; we see that there are indeed several high-performing treatments with overlapping confidence intervals. This means that we cannot come to a clear-cut decision which treatment is in fact the best, but instead see that there are indeed several treatments for which we can see a high effectiveness compared to placebo.

11.1.3 Evaluating the validity of our results The Net Heat Plot

The netmeta package has an in-built function, netheat(), which allows us to produce a net heat plot. This net heat plot is very helpful to evaluate the inconsistency in our network model, and what contributes to it. We simply plug our netmeta object into the netheat() function to generate such a plot. The argument nchar.trts can be used to shorten our comparison labels to a specific number of characters. Let us try the function out now.

netheat(m.netmeta, nchar.trts = 4)

The function generates a quadratic matrix in which each element in a row is compared to all other elements in the columns. It is important here to mention that the rows and columns signify specific designs, not all \(m\) treatment comparisons in our network. Thus, we also have rows and columns for the multiarm study Willms1999, which had a design comparing “Plac”, “Metf” and Acar. Treatment comparison with only one kind of evidence (i.e. indirect or indirect evidence) are omitted in this plot, because we are interested in cases of inconsistency between direct and indirect evidence. Beyond that, the net heat plot has two important features (Schwarzer, Carpenter, and Rücker 2015):

  1. Grey boxes. The grey boxes for each comparison of designs signify how important a treatment comparison is for the estimation of another treatment comparison. The bigger the box, the more important a comparison is. An easy way to analyse this is to gow through the rows of the plot one after another, and then to check for each row in which columns the grey boxes are the largest. A common finding is that the boxes are large in a row where the row comparison and the column comparison intersect, meaning that direct evidence was used. For example, a particularly big grey box can be seen at the intersection of the “Plac vs Rosi2 row and the”Plac vs Rosi" column.
  2. Colored backgrounds. The colored backgrounds, ranging from blue to red, signify the inconsistency of the comparison in a row attributable to the design in a column. Inconsistent fields are displayed in the upper-left corner in red. For example, in the row for “Rosi vs Sulf”, we see that the entry in column “Metf vs Sulf” is displayed in red. This means that the evidence contributed by “Metf vs Sulf” for the estimation of “Metf vs Sulf” is inconsistent with the other evidence.

We can now remind ourselves that these results are based on the fixed-effects model, which we used for our network analysis to begin with. From what we have seen so far, we can conclude that the fixed-effect model is not justified, because there is too much unexpected heterogeneity. We can thus check how the net heat plot changes when we assume a random-effects model by changing the random argument of the netheat function to TRUE. We see that this results in a substantial decrease of inconsistency in our network.

        nchar.trts = 4,
        random = TRUE) Net splitting

Another method to check for consistency in our network is net splitting, also known as node splitting. This method splits our network estimates into the contribution of direct and indirect evidence, which allows us to control for inconsistency in specific comparisons in our network. To generate a net split and compare the results, we only have to plug our netmeta object into the netsplit function.

## Back-calculation method to split direct and indirect evidence
## Fixed effect model: 
##                     comparison k prop     nma  direct  indir.    Diff     z  p-value
##         Acarbose vs Benfluorex 0    0  0.0778       .  0.0778       .     .        .
##          Acarbose vs Metformin 1 0.10  0.2867  0.2000  0.2966 -0.0966 -0.26   0.7981
##           Acarbose vs Miglitol 0    0  0.1166       .  0.1166       .     .        .
##       Acarbose vs Pioglitazone 0    0  0.2391       .  0.2391       .     .        .
##            Acarbose vs Placebo 2 0.63 -0.8274 -0.8172 -0.8446  0.0274  0.12   0.9030
##      Acarbose vs Rosiglitazone 0    0  0.3745       .  0.3745       .     .        .
##        Acarbose vs Sitagliptin 0    0 -0.2574       . -0.2574       .     .        .
##       Acarbose vs Sulfonylurea 1 0.53 -0.3879 -0.4000 -0.3740 -0.0260 -0.11   0.9088
##       Acarbose vs Vildagliptin 0    0 -0.1274       . -0.1274       .     .        .
##        Benfluorex vs Metformin 0    0  0.2089       .  0.2089       .     .        .
##         Benfluorex vs Miglitol 0    0  0.0387       .  0.0387       .     .        .
##     Benfluorex vs Pioglitazone 0    0  0.1612       .  0.1612       .     .        .
##          Benfluorex vs Placebo 2 1.00 -0.9052 -0.9052       .       .     .        .
##    Benfluorex vs Rosiglitazone 0    0  0.2967       .  0.2967       .     .        .
##      Benfluorex vs Sitagliptin 0    0 -0.3352       . -0.3352       .     .        .
##     Benfluorex vs Sulfonylurea 0    0 -0.4657       . -0.4657       .     .        .
##     Benfluorex vs Vildagliptin 0    0 -0.2052       . -0.2052       .     .        .
##          Metformin vs Miglitol 0    0 -0.1702       . -0.1702       .     .        .
##      Metformin vs Pioglitazone 1 0.68 -0.0477 -0.1600  0.1866 -0.3466 -2.32   0.0201
##           Metformin vs Placebo 4 0.58 -1.1141 -1.1523 -1.0608 -0.0915 -0.76   0.4489
##     Metformin vs Rosiglitazone 2 0.18  0.0877  0.0731  0.0908 -0.0178 -0.10   0.9204
##       Metformin vs Sitagliptin 0    0 -0.5441       . -0.5441       .     .        .
##      Metformin vs Sulfonylurea 1 0.56 -0.6746 -0.3700 -1.0611  0.6911  3.88   0.0001
##      Metformin vs Vildagliptin 0    0 -0.4141       . -0.4141       .     .        .
##       Miglitol vs Pioglitazone 0    0  0.1225       .  0.1225       .     .        .
##            Miglitol vs Placebo 3 1.00 -0.9439 -0.9439       .       .     .        .
##      Miglitol vs Rosiglitazone 0    0  0.2579       .  0.2579       .     .        .
##        Miglitol vs Sitagliptin 0    0 -0.3739       . -0.3739       .     .        .
##       Miglitol vs Sulfonylurea 0    0 -0.5044       . -0.5044       .     .        .
##       Miglitol vs Vildagliptin 0    0 -0.2439       . -0.2439       .     .        .
##        Pioglitazone vs Placebo 1 0.36 -1.0664 -1.3000 -0.9363 -0.3637 -2.30   0.0215
##  Pioglitazone vs Rosiglitazone 1 0.20  0.1354  0.1000  0.1442 -0.0442 -0.22   0.8289
##    Pioglitazone vs Sitagliptin 0    0 -0.4964       . -0.4964       .     .        .
##   Pioglitazone vs Sulfonylurea 0    0 -0.6269       . -0.6269       .     .        .
##   Pioglitazone vs Vildagliptin 0    0 -0.3664       . -0.3664       .     .        .
##       Rosiglitazone vs Placebo 6 0.83 -1.2018 -1.1483 -1.4665  0.3182  2.50   0.0125
##         Sitagliptin vs Placebo 1 1.00 -0.5700 -0.5700       .       .     .        .
##        Sulfonylurea vs Placebo 0    0 -0.4395       . -0.4395       .     .        .
##        Vildagliptin vs Placebo 1 1.00 -0.7000 -0.7000       .       .     .        .
##   Rosiglitazone vs Sitagliptin 0    0 -0.6318       . -0.6318       .     .        .
##  Rosiglitazone vs Sulfonylurea 1 0.41 -0.7623 -1.2000 -0.4575 -0.7425 -3.97 < 0.0001
##  Rosiglitazone vs Vildagliptin 0    0 -0.5018       . -0.5018       .     .        .
##    Sitagliptin vs Sulfonylurea 0    0 -0.1305       . -0.1305       .     .        .
##    Sitagliptin vs Vildagliptin 0    0  0.1300       .  0.1300       .     .        .
##   Sulfonylurea vs Vildagliptin 0    0  0.2605       .  0.2605       .     .        .
## Legend:
##  comparison - Treatment comparison
##  k          - Number of studies providing direct evidence
##  prop       - Direct evidence proportion
##  nma        - Estimated treatment effect (MD) in network meta-analysis
##  direct     - Estimated treatment effect (MD) derived from direct evidence
##  indir.     - Estimated treatment effect (MD) derived from indirect evidence
##  Diff       - Difference between direct and indirect treatment estimates
##  z          - z-value of test for disagreement (direct versus indirect)
##  p-value    - p-value of test for disagreement (direct versus indirect)

The important information here is in the p-value column. If the value in this column is \(p<0.05\), there is a significant disagreement (inconsistency) between the direct and indirect estimate. We see in the output that there are indeed a few comparisons showing significant inconsistency between direct and indirect evidence when using the fixed-effects model. A good way to visualize the netsplit results is through a forest plot displaying all comparisons for which there is both direct and indirect evidence.

forest(netsplit(m.netmeta)) Comparison-adjusted Funnel Plots

Assessing the publication bias of a network meta-analysis in its aggregated form is difficult. Analyzing so-called comparison-adjusted funnel plot (see Chapter 9.1 for the general idea behind funnel plots) has been proposed to evaluate the risk of publication bias under specific circumstances (Salanti et al. 2014). Comparison-adjusted funnel plots allow to assess potential publication bias if we have an a priori hypothesis concerning which mechanism may underlie publication bias.

For example, we may have the hypothesis that publication bias exists in our data because studies which find that a new form of treatment is superior to an already known treatment have a higher chance of getting published, even if they have a small sample size, and thus a larger standard error of their effect size estimate. This is a pretty logical assumption, because it seems sensible that scientific journals may be particularly interested in publishing “novel” findings, such as that there might be a better treatment format for some kind of condition. The funnel() function in netmeta makes it easy to generate comparison-adjusted funnel plots to test such hypotheses. Here are the most important parameters of the function we may want to specify:

Code Description
x Here, we specify the name of our netmeta object.
order This paramter specifies the order of the hypothesized publication bias mechanism. We simply have to concenate (‘c()’) the names of all treatments in our network here, and order them according to our hypothesis. For example, if we want to test for publication bias favoring ‘newer’ treatments, we would insert the names of all treatments here, starting from the oldest treatment and ending with the most novel type of treatment or intervention.
pch This lets us specify the symbol used for the studies in the funnel plot. Setting this to ‘19’ gives us simple dots.
col Using this parameter, we can specifiy a concatenated selection of colors we want to use to distinguish the different comparison. The number of colors we specify here must be the same as the number of comparisons in our funnel plot. A complete list of possible color that R can use for plotting can be found here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
linreg If we set this to TRUE, Egger’s test of the intercept for funnel plot asymmetry will be conducted, and its p-value will be displayed in the plot.
xlim, ylim These parameters let us specify the scope of the x and y axis
studlab If we set this to TRUE, study labels will be printed in the plot along with each point
cex.studlab This parameter lets us control the size of the study label. A value smaller than 1 means that the study labels will become smaller than by default
funnel <- funnel(m.netmeta, 
                 order = c("placebo", "sitagliptin", "pioglitazone", "vildagliptin", "rosiglitazone",
                         "acarbose", "metformin", "miglitol",
                         "benfluorex", "sulfonylurea"),
                 pch = 19,
                 col = c("blue", "red", "purple", "yellow", "grey", "green", "black", "brown",
                         "orange", "pink", "khaki", "plum", "aquamarine", "sandybrown", "coral"),
                 linreg = TRUE,
                 xlim = c(-1, 2),
                 ylim = c(0.5, 0),
                 studlab = TRUE,
                 cex.studlab = 0.7)

If our hypothesis is true, we would expect that studies with a small sample, and thus a higher standard error would by asymmetrically distributed around the zero line in our funnel plot. This is because we would expect that small studies comparing a novel treatment to an older one, yet finding that the new treatment is not better, are less likely to get published. In our plot, and from the p-value for Egger’s Test (\(p=0.93\)), however, we see that such funnel asymmetry is not present. Therefore, we cannot say that publication bias is present in our network because of “innovative” treatments with favorable trial effect being more likely to get published.

11.1.4 Summary

This has been a long chapter, and there have been many things we have covered until now. We have showed you the core assumptions behind the statistical modeling approach of the netmeta() package, described how to fit your network meta-analysis model, visualize and interpret the results, and evaluate the validity of your findings. We would like to stress that it is crucially important in network meta-analysis not base your clinical decision-making on one single test or metric, but to explore your model and its results with open eyes, checking the patterns you find for their consistency and robustness, and to assess the uncertainty concerning some of its parameters.

In the next subchapter, we will address how to perform network meta-analysis using a bayesian hierarchical framework. Although the statistical procedures behind this approach vary considerably from what is described here at times, both methods essentially try to answer the same question. Process-wise, the analysis “pipeline” is also very similar most of the times. Time to go bayesian!


Schwarzer, Guido, James R Carpenter, and Gerta Rücker. 2015. Meta-Analysis with R. Springer.

Rücker, Gerta, Guido Schwarzer, Ulrike Krahn, Jochem König, and Maintainer Guido Schwarzer. 2015. “Package ‘Netmeta’.” Network Meta-Analysis Using Frequentist Methods (Version 0.7-0).

Rücker, Gerta. 2012. “Network Meta-Analysis, Electrical Networks and Graph Theory.” Research Synthesis Methods 3 (4). Wiley Online Library: 312–24.

Aronow, Peter M, and Benjamin T Miller. 2019. Foundations of Agnostic Statistics. Cambridge University Press.

Jackson, Dan, Ian R White, and Richard D Riley. 2013. “A Matrix-Based Method of Moments for Fitting the Multivariate Random Effects Model for Meta-Analysis and Meta-Regression.” Biometrical Journal 55 (2). Wiley Online Library: 231–45.

Senn, Stephen, Francois Gavini, David Magrez, and Andre Scheen. 2013. “Issues in Performing a Network Meta-Analysis.” Statistical Methods in Medical Research 22 (2). Sage Publications Sage UK: London, England: 169–89.

Higgins, JPT, D Jackson, JK Barrett, G Lu, AE Ades, and IR White. 2012. “Consistency and Inconsistency in Network Meta-Analysis: Concepts and Models for Multi-Arm Studies.” Research Synthesis Methods 3 (2). Wiley Online Library: 98–110.

König, Jochem, Ulrike Krahn, and Harald Binder. 2013. “Visualizing the Flow of Evidence in Network Meta-Analysis and Characterizing Mixed Treatment Comparisons.” Statistics in Medicine 32 (30). Wiley Online Library: 5414–29.

Rücker, Gerta, and Guido Schwarzer. 2015. “Ranking Treatments in Frequentist Network Meta-Analysis Works Without Resampling Methods.” BMC Medical Research Methodology 15 (1). BioMed Central: 58.

Mbuagbaw, L, B Rochwerg, R Jaeschke, D Heels-Andsell, W Alhazzani, L Thabane, and Gordon H Guyatt. 2017. “Approaches to Interpreting and Choosing the Best Treatments in Network Meta-Analyses.” Systematic Reviews 6 (1). BioMed Central: 79.

Salanti, Georgia, Cinzia Del Giovane, Anna Chaimani, Deborah M Caldwell, and Julian PT Higgins. 2014. “Evaluating the Quality of Evidence from a Network Meta-Analysis.” PloS One 9 (7). Public Library of Science: e99682.