## 11.1 Frequentist Network Meta-Analysis

In the following, we will describe how to **perform a network meta-analysis** using the `netmeta`

package (Schwarzer, Carpenter, and Rücker 2015; Rücker et al. 2015). This package allows to estimate network meta-analysis models within a **frequentist framework**, with its statistical approach derived from graph theoretical methods developed for electrical networks (Rücker 2012).

**The frequentist interpretation of probability**

Frequentism is a common theoretical approach to interpret the probability of an event \(E\). This approach defines the probability of \(E\) in terms of how *frequently* \(E\) occurs if we repeat some process (e.g., an experiment) many, many times (Aronow and Miller 2019).

Frequentist ideas are at the core of many statistical procedures quantitative researchers use on a day to day basis, such as **significance testing**, calculating **confidence intervals** or interpreting **p-values**.

### 11.1.1 The Network Meta-Analysis Model

Let us now describe how the **network meta-analysis model** underlying the `netmeta`

package is **formulated**. Let us assume that we have **collected effect size data** from \(K\) trials. We can then check all our \(K\) trials and then **count the total number of treatment comparisons** (e.g. some antidepressant vs. placebo) which are included in these trials. This number of pairwise comparisons is denoted by \(m\). We then calculate the effect size \(\hat\theta\) for each of the pairwise comparisons, and then put them together in a **vector**. You can think of a vector as an **object containing a collection of numbers**, like a `numeric`

column in an R dataframe is basically a collection of numbers (see Chapter 3.3). In mathematical notation, vectors are usually **written in bold**. We therefore denote the vector containing all pairwise effect sizes \((\hat\theta_1, \hat\theta_2, ..., \hat\theta_m)\) as \(\mathbf{\hat\theta}\), and the vector containing the respective standard errors as \(\mathbf{s} = s_1, s_2, ..., s_m\). Our model formula then looks like this:

\[\mathbf{\hat\theta} = \mathbf{X}\mathbf{\theta}^{treatment} + \mathbf{\epsilon} \]
Using this equation, we can basically imagine that our effect size vector \(\mathbf{\hat\theta}\) was **“generated” by the right side of the formula**, our model. The first part, \(\mathbf{X}\), is the \(m \times n\) **design matrix**, in which the columns represent the different treatments \(n\), and the rows the comparisons between treatments, which are signified by \(1\) and \(-1\). The parameter \(\mathbf{\epsilon}\) is a vector of sampling errors \(\epsilon_k\), which are assumed to be randomly drawn from a gaussian normal distribution with mean 0 and variance \(s_i^2\):

\[\epsilon_k \sim \mathcal{N}(0,s_i^2)\]
The really important part of the formula, and what we actually want to estimate, is the vector \(\mathbf{\theta}^{treatment}\). This vector contains the **“real” effects of the \(n\) treatments abstracted from the treatment comparisons we have in our network**, and in the end allows us to compare the effects of different treatments directly.

To exemplify this (see Schwarzer, Carpenter, and Rücker 2015), let us assume we have \(K=5\) studies, each containing **one distinct treatment comparison**: \(A-B\), \(A-C\), \(A-D\), \(B-C\), and \(B-D\). We then have a vector of (observed) treatment comparisons \(\mathbf{\hat\theta} = (\hat\theta_{1,A,B}, \hat\theta_{2,A,C}, \hat\theta_{3,A,D}, \hat\theta_{4,B,C}, \hat\theta_{5,B,D})^\top\) and our vector containing the four (unknown) “true” effects for each treatment included in our network, \(\mathbf{\theta}^{treatment} = (\theta_A, \theta_B, \theta_C, \theta_D)^\top\). If we **plug** these parameters into our **model formula**, we get the following equation:

\[\mathbf{\hat\theta} = \mathbf{X}\mathbf{\theta}^{treatment} + \mathbf{\epsilon}\] \[\begin{align} \left( \begin{array}{c} \hat\theta_{1,A,B} \\ \hat\theta_{2,A,C} \\ \hat\theta_{3,A,D} \\ \hat\theta_{4,B,C} \\ \hat\theta_{5,B,D} \\ \end{array} \right) = \left( \begin{array}{cccc} 1 & -1 & 0 & 0 \\ 1 & 0 & -1 & 0 \\ 1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{array} \right) \left( \begin{array}{c} \theta_{A} \\ \theta_{B} \\ \theta_{C} \\ \theta_{D} \\ \end{array} \right) + \left( \begin{array}{c} \epsilon_{1} \\ \epsilon_{2} \\ \epsilon_{3} \\ \epsilon_{4} \\ \epsilon_{5} \\ \end{array} \right) \end{align}\]

The **important part** to mention here is that in its current form, the model formula is problematic from a **mathematical point of view**. Right now, the model is **overparameterized**, meaning that too many parameters in our model have to be estimated based on the data we have. This has to do with the design matrix \(\mathbf{X}\) not being of **full rank**. A matrix is not of full rank whenever the **rows** of the matrix are not all **independent**. Because we are dealing with a network of comparisons it is clear that not all comparisons (rows) will be independent from each other in our matrix; in our example, the row for comparison \(B-C\) is a **linear combination** of comparisons \(A-B\) and \(A-C\), and thus not independent. The fact that the \(\mathbf{X}\) matrix is not of full rank, however, means that it is **not invertible**, which makes it impossible to directly estimate \(\mathbf{\theta}^{treatment}\) using a weighted least squares approach. While we have at best \(n-1\) *independent* treatment comparisons, our model at the same time **has to estimate the “true” effect for each of the \(n\) treatments** in \(\mathbf{\theta}^{treatment}\).

This is where the **graph theoretical approach** of the `netmeta`

package comes in with a solution. We will spare you the mathematical details of this approach, particularly given that the `netmeta`

package will do the heavy lifting for us anyway. Let us only mention that this approach involves contructing a so-called **Moore-Penrose pseudoinverse matrix** which then allows for calculating the fitted values of our network model using a weighted least squares approach. The procedure also takes care of **multiarm studies which contribute more than one pairwise comparison** (i.e., studies in which more than two treatments were compared with each other). Since such comparisons are correlated when coming from the same study (see Chapter 13.9), the standard errors of multiarm study comparisons are increased artificially. A random-effects model can also be incorporated into this approach by adding the **estimated heterogeneity** \(\hat\tau^2\) to the variance of each comparison \(i\): \(s^2_i + \hat\tau^2\). In the `netmeta`

package, \(\tau^2\) is estimated using an adaptation of the **DerSimonian-Laird** estimator (Jackson, White, and Riley 2013). Values of \(I^2\) are also calculated, which represent the amount of **inconsistency** in our network. The model allows to calculate an equivalent of **Cochran’s \(Q\)** (see Chapter 6), \(Q_{total}\), measuring the heterogeneity in the network. This can be used to calculate an \(I^2\) equivalent for our network model:

\[I^2 = max(\frac{Q_{total}-df}{Q_{total}}, 0)\]

Where the **degrees of freedom** in our network (\(df\)) are:

\[df = (\sum_{k=1}^Kp_k-1)-(n-1) \] with \(K\) being the number of studies, \(p\) the number of arms in each study \(k\), and \(n\) the total number of treatments in our entire network.

### 11.1.2 Performing a Network Meta-Analysis using the `netmeta`

package

After all this statistical input, it is now time that we start working with the `netmeta`

package. As always, we first **install the package** and then **load it into our library**.

```
install.packages("netmeta")
library(netmeta)
```

#### 11.1.2.1 Data Preprocessing & Exploration

For our first network meta-analysis, we will use an **original dataset** reported by Senn et al. (2013). This dataset contains effect size data of randomized controlled trials comparing different medications for **diabetes**. The effect size of all comparisons denotes the **mean difference** (MD) in diabetic patients’ **HbA1c value** at post-test. This value represents the concentration of glucose in the blood, which diabetic medication aims to decrease. We will use this dataset because it is already **preinstalled** in the `netmeta`

package, with all columns labeled correctly. To work with the dataset, we only need to **load it into our global environment** using the `data()`

function. The dataset is then available as `Senn2013`

, which we then rename to `data`

for our convenience. Here is how you do this:

```
data(Senn2013)
data <- Senn2013
```

Now, let us have a look at the `data`

.

We see that the data has 28 rows, representing the treatment comparisons, and **seven columns**, some of which may already seem familiar to you if you have worked yourself through previous chapters.

- The first column,
**TE**, contains the**effect size**of each comparison, and**seTE**contains the respective**standard error**. In case you do not have precalculated effect size data for each comparison, you can first use the`metacont`

(Chapter 4.1.2) or`metabin`

function (Chapter 4.3), and then extract the calculated effect sizes from the meta-analysis object you created using the`$TE`

and`$seTE`

selector. Another, more flexible approach may be to use the**effect size calculators**we describe in Chapter 13. **treat1.long**,**treat2.long**,**treat1**and**treat2**represent the**two treatments being compared**. Variables**treat1**and**treat2**simply contain a shortened name of the original treatment name, and are thus redundant.- The
**studlab**column contains the**unique study label**, signifying in which study the specific treatment comparison was made. We can easily check if we have**multiarm studies**contributing more than one comparison by using the`summary()`

function.

`summary(data$studlab)`

```
## Alex1998 Baksi2004 Costa1997
## 1 1 1
## Davidson2007 DeFronzo1995 Derosa2004
## 1 1 1
## Garber2008 Gonzalez-Ortiz2004 Hanefeld2004
## 1 1 1
## Hermansen2007 Johnston1994 Johnston1998a
## 1 1 1
## Johnston1998b Kerenyi2004 Kim2007
## 1 1 1
## Kipnes2001 Lewin2007 Moulin2006
## 1 1 1
## Oyama2008 Rosenstock2008 Stucci1996
## 1 1 1
## Vongthavaravat2002 Willms1999 Wolffenbuttel1999
## 1 3 1
## Yang2003 Zhu2003
## 1 1
```

We see that all studies **only contribute one comparison**, except for `Willms1999`

, which **contributes three**. For all later steps, it is essential that you (1) include the **studlab** column in your dataset, (2) each individual study gets a unique label/name in the column, and (3) studies which contribute 2+ comparisons are named exactly the same across comparisons.

#### 11.1.2.2 Fitting the Model

We can now proceed by fitting **our first network meta-analysis model** using the `netmeta`

function. The function has many, many parameters, and the most important ones are described below.

Code | Description |
---|---|

TE | The name of the column in our dataset containing the effect sizes for each comparison |

seTE | The name of the column in our dataset containing the standard error of the effect size for each comparison |

treat1 | The column in our dataset containing the name of the first treatment in a comparison |

treat2 | The column in our dataset containing the name of the second treatment in a comparison |

studlab | The column in our dataset containing the name of the study a comparison was extracted from. Although this argument is per se optional, we recommend to always specify it, because this is the only way to let the function know if multiarm trials are part of our network |

data | The dataset containing all our network data |

sm | The summary measure underlying our TE column. This can be specified as ‘RD’ (Risk Difference), ‘RR’ (Risk Ratio), ‘OR’ (Odds Ratio), ‘HR’ (hazard ratio), ‘MD’ (mean difference), ‘SMD’ (standardized mean difference), etc. |

comb.fixed | Whether a fixed-effects network meta-analysis should be conducted (TRUE/FALSE) |

comb.random | Whether a random-effects network meta-analysis should be conducted (TRUE/FALSE) |

reference.group | This lets us specify which treatment should be taken as a reference treatment (e.g. reference.group = ‘placebo’) for all other treatments |

tol.multiarm | The effect sizes for comparisons from multi-arm studies are, by design, consistent. Sometimes however, original papers may report slightly deviating results for each comparison, which may result in a violation of consistency. This argument lets us specify a tolerance threshold (a numeric value) for the inconsistency of effect sizes and their variances allowed in our network |

details.chkmultiarm | Wether we want to print out effect estimates of multiarm comparisons with inconsistent effect sizes. |

sep.trts | The character trough which compared treatments are seperated, e.g. ’ vs. ’ |

I will save the the result of the function the object `m.netmeta`

. Let’s look at the **results of our first model**, for now assuming a **fixed-effects model**.

```
m.netmeta <- netmeta(TE = TE,
seTE = seTE,
treat1 = treat1,
treat2 = treat2,
studlab = paste(data$studlab),
data = data,
sm = "MD",
comb.fixed = TRUE,
comb.random = FALSE,
reference.group = "plac",
details.chkmultiarm = TRUE,
sep.trts = " vs ")
m.netmeta
```

```
## Original data (with adjusted standard errors for multi-arm studies):
##
## treat1 treat2 TE seTE seTE.adj narms multiarm
## DeFronzo1995 metf plac -1.9000 0.1414 0.1414 2
## Lewin2007 metf plac -0.8200 0.0992 0.0992 2
## Willms1999 acar metf 0.2000 0.3579 0.3884 3 *
## Davidson2007 plac rosi 1.3400 0.1435 0.1435 2
## Wolffenbuttel1999 plac rosi 1.1000 0.1141 0.1141 2
## Kipnes2001 piog plac -1.3000 0.1268 0.1268 2
## Kerenyi2004 plac rosi 0.7700 0.1078 0.1078 2
## Hanefeld2004 metf piog -0.1600 0.0849 0.0849 2
## Derosa2004 piog rosi 0.1000 0.1831 0.1831 2
## Baksi2004 plac rosi 1.3000 0.1014 0.1014 2
## Rosenstock2008 plac rosi 1.0900 0.2263 0.2263 2
## Zhu2003 plac rosi 1.5000 0.1624 0.1624 2
## Yang2003 metf rosi 0.1400 0.2239 0.2239 2
## Vongthavaravat2002 rosi sulf -1.2000 0.1436 0.1436 2
## Oyama2008 acar sulf -0.4000 0.1549 0.1549 2
## Costa1997 acar plac -0.8000 0.1432 0.1432 2
## Hermansen2007 plac sita 0.5700 0.1291 0.1291 2
## Garber2008 plac vild 0.7000 0.1273 0.1273 2
## Alex1998 metf sulf -0.3700 0.1184 0.1184 2
## Johnston1994 migl plac -0.7400 0.1839 0.1839 2
## Johnston1998a migl plac -1.4100 0.2235 0.2235 2
## Kim2007 metf rosi -0.0000 0.2339 0.2339 2
## Johnston1998b migl plac -0.6800 0.2828 0.2828 2
## Gonzalez-Ortiz2004 metf plac -0.4000 0.4356 0.4356 2
## Stucci1996 benf plac -0.2300 0.3467 0.3467 2
## Moulin2006 benf plac -1.0100 0.1366 0.1366 2
## Willms1999 metf plac -1.2000 0.3758 0.4125 3 *
## Willms1999 acar plac -1.0000 0.4669 0.8242 3 *
##
## Number of treatment arms (by study):
## narms
## Alex1998 2
## Baksi2004 2
## Costa1997 2
## Davidson2007 2
## DeFronzo1995 2
## Derosa2004 2
## Garber2008 2
## Gonzalez-Ortiz2004 2
## Hanefeld2004 2
## Hermansen2007 2
## Johnston1994 2
## Johnston1998a 2
## Johnston1998b 2
## Kerenyi2004 2
## Kim2007 2
## Kipnes2001 2
## Lewin2007 2
## Moulin2006 2
## Oyama2008 2
## Rosenstock2008 2
## Stucci1996 2
## Vongthavaravat2002 2
## Willms1999 3
## Wolffenbuttel1999 2
## Yang2003 2
## Zhu2003 2
##
## Results (fixed effect model):
##
## treat1 treat2 MD 95%-CI Q leverage
## DeFronzo1995 metf plac -1.1141 [-1.2309; -0.9973] 30.89 0.18
## Lewin2007 metf plac -1.1141 [-1.2309; -0.9973] 8.79 0.36
## Willms1999 acar metf 0.2867 [ 0.0622; 0.5113] 0.05 0.09
## Davidson2007 plac rosi 1.2018 [ 1.1084; 1.2953] 0.93 0.11
## Wolffenbuttel1999 plac rosi 1.2018 [ 1.1084; 1.2953] 0.80 0.17
## Kipnes2001 piog plac -1.0664 [-1.2151; -0.9178] 3.39 0.36
## Kerenyi2004 plac rosi 1.2018 [ 1.1084; 1.2953] 16.05 0.20
## Hanefeld2004 metf piog -0.0477 [-0.1845; 0.0891] 1.75 0.68
## Derosa2004 piog rosi 0.1354 [-0.0249; 0.2957] 0.04 0.20
## Baksi2004 plac rosi 1.2018 [ 1.1084; 1.2953] 0.94 0.22
## Rosenstock2008 plac rosi 1.2018 [ 1.1084; 1.2953] 0.24 0.04
## Zhu2003 plac rosi 1.2018 [ 1.1084; 1.2953] 3.37 0.09
## Yang2003 metf rosi 0.0877 [-0.0449; 0.2203] 0.05 0.09
## Vongthavaravat2002 rosi sulf -0.7623 [-0.9427; -0.5820] 9.29 0.41
## Oyama2008 acar sulf -0.3879 [-0.6095; -0.1662] 0.01 0.53
## Costa1997 acar plac -0.8274 [-1.0401; -0.6147] 0.04 0.57
## Hermansen2007 plac sita 0.5700 [ 0.3170; 0.8230] 0.00 1.00
## Garber2008 plac vild 0.7000 [ 0.4505; 0.9495] 0.00 1.00
## Alex1998 metf sulf -0.6746 [-0.8482; -0.5011] 6.62 0.56
## Johnston1994 migl plac -0.9439 [-1.1927; -0.6952] 1.23 0.48
## Johnston1998a migl plac -0.9439 [-1.1927; -0.6952] 4.35 0.32
## Kim2007 metf rosi 0.0877 [-0.0449; 0.2203] 0.14 0.08
## Johnston1998b migl plac -0.9439 [-1.1927; -0.6952] 0.87 0.20
## Gonzalez-Ortiz2004 metf plac -1.1141 [-1.2309; -0.9973] 2.69 0.02
## Stucci1996 benf plac -0.9052 [-1.1543; -0.6561] 3.79 0.13
## Moulin2006 benf plac -0.9052 [-1.1543; -0.6561] 0.59 0.87
## Willms1999 metf plac -1.1141 [-1.2309; -0.9973] 0.04 0.02
## Willms1999 acar plac -0.8274 [-1.0401; -0.6147] 0.04 0.02
##
## Number of studies: k = 26
## Number of treatments: n = 10
## Number of pairwise comparisons: m = 28
## Number of designs: d = 15
##
## Fixed effects model
##
## Treatment estimate (sm = 'MD', comparison: other treatments vs 'plac'):
## MD 95%-CI
## acar -0.8274 [-1.0401; -0.6147]
## benf -0.9052 [-1.1543; -0.6561]
## metf -1.1141 [-1.2309; -0.9973]
## migl -0.9439 [-1.1927; -0.6952]
## piog -1.0664 [-1.2151; -0.9178]
## plac . .
## rosi -1.2018 [-1.2953; -1.1084]
## sita -0.5700 [-0.8230; -0.3170]
## sulf -0.4395 [-0.6188; -0.2602]
## vild -0.7000 [-0.9495; -0.4505]
##
## Quantifying heterogeneity / inconsistency:
## tau^2 = 0.1087; I^2 = 81.4%
##
## Tests of heterogeneity (within designs) and inconsistency (between designs):
## Q d.f. p-value
## Total 96.99 18 < 0.0001
## Within designs 74.46 11 < 0.0001
## Between designs 22.53 7 0.0021
```

**There is plenty to see in this output, so let us go through it one by one.**

- The first thing we see are the calculated effect sizes for each comparison, with an asterisk signifying multiarm studies, for which the standard error had to be corrected.
- Next, we see an overview of the number of treatment arms in each included study. Again, it is the study
`Willms2003`

which stands out here because it contains three treatment arms, and thus multiple comparisons. - The next table shows us the fitted values for each comparison in our network meta-analysis model. The
`Q`

column in this table is usually very interesting, because it tells us which comparison may contribute substantially to the overall inconsistency in our network. For example, we see that the`Q`

value of`DeFronzo1995`

is rather high, with \(Q=30.89\). - We then get to the core of our network model: the
`Treatment estimate`

s. As specified, the effects of all treatments are displayed in comparison to the placebo condition, which is why there is no effect shown for`plac`

. - We also see that the
`heterogeneity/inconsistency`

in our network model is very high, with \(I^2 = 81.4\%\). This means that a random-effects model may be warranted, and that we should rerun the function setting`comb.random`

to`TRUE`

. - The last part of the output (
`Tests of heterogeneity`

) breaks down the total heterogeneity in our network into heterogeneity attributable to within and between-design variation, respectively. The heterogeneity between treatment designs reflects the actual inconsistency in our network, and is highly significant (\(p=0.0021\)). The (“conventional”) within-designs heterogeneity is also highly significant. The information provided here is yet another sign that the random-effects model may be indicated for our network meta-analysis model. To directly check this, we can calculate the total inconsistency from the**full design-by-treatment interaction random-effects model**(Higgins et al. 2012). To do this, we only have to plug the`m.netmeta`

object into the`decomp.design`

function.

`decomp.design(m.netmeta)`

```
## Q statistics to assess homogeneity / consistency
##
## Q df p-value
## Total 96.99 18 < 0.0001
## Within designs 74.46 11 < 0.0001
## Between designs 22.53 7 0.0021
##
## Design-specific decomposition of within-designs Q statistic
##
## Design Q df p-value
## metf vs rosi 0.19 1 0.6655
## plac vs benf 4.38 1 0.0363
## plac vs metf 42.16 2 < 0.0001
## plac vs migl 6.45 2 0.0398
## plac vs rosi 21.27 5 0.0007
##
## Between-designs Q statistic after detaching of single designs
##
## Detached design Q df p-value
## acar vs sulf 22.52 6 0.0010
## metf vs piog 17.13 6 0.0088
## metf vs rosi 22.52 6 0.0010
## metf vs sulf 7.51 6 0.2760
## piog vs rosi 22.48 6 0.0010
## plac vs acar 22.44 6 0.0010
## plac vs metf 22.07 6 0.0012
## plac vs piog 17.25 6 0.0084
## plac vs rosi 16.29 6 0.0123
## rosi vs sulf 6.77 6 0.3425
## plac vs acar vs metf 22.38 5 0.0004
##
## Q statistic to assess consistency under the assumption of
## a full design-by-treatment interaction random effects model
##
## Q df p-value tau.within tau2.within
## Between designs 2.19 7 0.9483 0.3797 0.1442
```

In the output, we are first presented with the `Q statistics to assess homogeneity / consistency`

again. The important part of the output is in the last section (`Q statistic to assess consistency under the assumption of a full design-by-treatment interaction random effects model`

). This test shows us that, assuming a random-effects model, the (between-design) inconsistency is non-significant (\(Q = 2.19,p=0.9483\)), again suggesting that a random-effects model may be indicated.

#### 11.1.2.3 Further examination of the network model

##### 11.1.2.3.1 The Network Graph

Now that we have created our network meta-analysis model, we can proceed and plot our **network graph**. This can be done using te `netgraph()`

function. This function has many parameters, which you can look up by typing `?netgraph()`

into your console. Most of those arguments, however, have very sensible default values, so we do not have to specify much here. As a first step, we feed the function with our `netmeta`

object `m.netmeta`

. We can also specify the **order** in which the treatment nodes appear using the `seq`

argument.

```
netgraph(m.netmeta,
seq = c("plac", "migl", "benf", "acar", "metf", "rosi", "sulf", "piog", "sita", "vild"))
```

**This network graph transports several kinds of information.**

- First, we see the overall
**structure**of comparisons in our network, allowing us to understand which treatments were compared with each other in the original data. - Second, we can see that the edges have a
**different thickness**, which corresponds to**how often**we find this specific comparison in our network. We see that Rosiglitazone has been compared to Placebo in many, many trials - We also see the one
**multiarm**trial in our network, which is represented by the**blue triangle**in our network. This is the study`Willms2003`

, which compared Meformin, Acarbose and Placebo.

The `netgraph()`

function also allows to plot a **3D graph**, which may be helpful to get a better grasp of you network structure. Although it may often not be possible to include such a graph in a scientific paper, it is still a very cool feature, so we did not want to withhold this from you. The function requires the `rgl`

package to be installed and loaded from your library. To produce a 3D graph, we then only have to specify the `dim`

argument as `"3d"`

.

```
library(rgl)
netgraph(m.netmeta, dim = "3d")
```

##### 11.1.2.3.2 Visualising direct and indirect evidence

As a next step, we can also turn our attention to the **direct and indirect evidence** in our network by looking at the **proportion** of direct and indirect contributing to each comparison. We have prepared a function for this purpose for you, which is called `direct.evidence.plot()`

. The `direct.evidence.plot`

function is part of the `dmetar`

package. If you have the package installed already, you have to load it into your library first.

`library(dmetar)`

If you don’t want to use the `dmetar`

package, you can find the source code for this function here. In this case, *R* doesn’t know this function yet, so we have to let *R* learn it by **copying and pasting** the code **in its entirety** into the **console** in the bottom left pane of RStudio, and then hit **Enter ⏎**. The function then requires the `ggplot2`

, `gridExtra`

, and `scales`

package to work.

The function provides you with two outputs: a **plot** showing the percentage of direct and indirect evidence used for each estimated comparison, and a corresponding data frame containing the underlying data. The only thing the `direct.evidence.plot()`

function needs as input is an object created by the `netmeta()`

function.

`direct.evidence.plot(x = m.netmeta)`

As we can see, there are many estimates in our network model which had to be inferred by indirect evidence alone. The plot also provides us with two additional metrics: the **Minimal Parallelism** and the **Mean Path Length** of each comparison. According to König, Krahn, and Binder (2013), lower values of minimal parallelism and \(Mean Path Length > 2\) means that results for a specific comparison should be interpreted with caution.

##### 11.1.2.3.3 Producing a matrix containing all treatment comparison estimates

Next, we can have a look at the **estimates** of our network for **all possible treatment combinations**. To do this, we can use the result **matrices** saved in our `netmeta`

results object under `$TE.fixed`

(if we use the fixed-effects model) or `$TE.random`

(if we use the random-effects model). We will make a few preprocessing steps to make the matrix more readable. First, we extract the matrix from our `m.netmeta`

object, and **round** the numbers in the matrix to **three digits**.

```
result.matrix <- m.netmeta$TE.fixed
result.matrix <- round(result.matrix, 3)
```

Given that one “triangle” in our matrix will **hold redundant information**, we replace the lower triangle with an empty value using this code:

`result.matrix[lower.tri(result.matrix, diag = FALSE)] <- "."`

**We then get the following results:**

Acarbose | Benfluorex | Metformin | Miglitol | Pioglitazone | Placebo | Rosiglitazone | Sitagliptin | Sulfonylurea | Vildagliptin | |
---|---|---|---|---|---|---|---|---|---|---|

Acarbose | 0 | 0.078 | 0.287 | 0.117 | 0.239 | -0.827 | 0.374 | -0.257 | -0.388 | -0.127 |

Benfluorex | . | 0 | 0.209 | 0.039 | 0.161 | -0.905 | 0.297 | -0.335 | -0.466 | -0.205 |

Metformin | . | . | 0 | -0.17 | -0.048 | -1.114 | 0.088 | -0.544 | -0.675 | -0.414 |

Miglitol | . | . | . | 0 | 0.123 | -0.944 | 0.258 | -0.374 | -0.504 | -0.244 |

Pioglitazone | . | . | . | . | 0 | -1.066 | 0.135 | -0.496 | -0.627 | -0.366 |

Placebo | . | . | . | . | . | 0 | 1.202 | 0.57 | 0.439 | 0.7 |

Rosiglitazone | . | . | . | . | . | . | 0 | -0.632 | -0.762 | -0.502 |

Sitagliptin | . | . | . | . | . | . | . | 0 | -0.131 | 0.13 |

Sulfonylurea | . | . | . | . | . | . | . | . | 0 | 0.261 |

Vildagliptin | . | . | . | . | . | . | . | . | . | 0 |

If we want to report these results in our research paper, a good idea might be to also include the **confidence intervals** for each comparison. These can be obtained the same way as above using the `$lower.fixed`

and `$upper.fixed`

(or `$lower.random`

and `$upper.random`

) matrices for the lower and upper confidence interval.

An extremely convenient way to export all estimated effect sizes is provided by the `netleague()`

function. This function creates a matrix similar to the one above. Yet, in the matrix produced by this function, the **upper triangle** will display only the pooled effect sizes of the **direct comparisons** available in our network, like one would attain them if we had performed a conventional meta-analysis for each comparison. Because we do not have direct evidence for all comparisons, some fields in the upper triangle will remain empty. The **lower triangle** then contains the **network meta-analysis effect sizes** for each comparison. This matrix can be easily exported into a .csv file, which we can then use to report our network meta-analysis results in a table (for example in Microsoft WORD). The big plus of this function is that effect size estimates and confidence intervals will be displayed together in each cell; we only have too tell the function how the brackets for the confidence intervals should look like, and how many digits we want our estimates to have behind the comma. If i want to save the fixed-effects model results in a .csv (EXCEL) document called **“netleague.csv”**, i can use the following code. As always, we also need to feed the `netleague`

function with our `netmeta`

object.

```
netleague <- netleague(m.netmeta, bracket = "(", digits=2)
write.csv(netleague$fixed, "netleague.csv")
```

##### 11.1.2.3.4 The treatment ranking

The most interesting question we may want to answer in a network meta-analysis is of course: **which intervention works the best?**. The `netrank()`

function implemented in `netmeta`

allows us to generate such a **ranking** of treatments from most to least beneficial. The `netrank()`

function is again based on a frequentist treatment ranking method using **P-scores**. These P-scores measure the certainty that one treatment is better than another treatment, averaged over all competing treatments. The P-score has been shown to be equivalent to the **SUCRA** score (Rücker and Schwarzer 2015), which we will describe in the chapter on bayesian network meta-analysis. The function needs our `netmeta`

object as input. Additionally, we should specify the `small.values`

parameter, which defines if smaller effect sizes in a comparison indicate a beneficial (`"good"`

)
or harmful (`"bad"`

) effect. Let us have a look at the output for our example:

`netrank(m.netmeta, small.values = "good")`

```
## P-score
## Rosiglitazone 0.9789
## Metformin 0.8513
## Pioglitazone 0.7686
## Miglitol 0.6200
## Benfluorex 0.5727
## Acarbose 0.4792
## Vildagliptin 0.3512
## Sitagliptin 0.2386
## Sulfonylurea 0.1395
## Placebo 0.0000
```

We see that the `Rosiglitazone`

treatment has the highest P-score, indicating that this treatment may be particularly helpful. Conversely, `Placebo`

has a P-score of zero, which seems to go along with our intuition that a placebo will not likely be the best treatment decision. Nevertheless, it should be noted that one should **never automatically conclude** that one treatment is the best by solely because it has the highest score (Mbuagbaw et al. 2017). A good way to also visualize the **uncertainty** in our network is to produce **network forest plots** with the **“weakest”** treatment as comparison. This can be done using the `forest`

function, wich works very similar to the one of the `meta`

package described in Chapter 5. We can specify the reference group for the forest plot using the `reference.group`

argument.

```
forest(m.netmeta,
reference.group = "placebo",
sortvar = TE,
xlim=c(-1.5,0.5),
col.square = "blue",
smlab = "Medications vs. Placebo \n (HbA1c value)")
```

We now see that the results are not as clear as they seemed before; we see that there are indeed **several high-performing treatments** with overlapping confidence intervals. This means that we cannot come to a clear-cut decision which treatment is in fact the best, but instead see that there are indeed several treatments for which we can see a high effectiveness compared to placebo.

### 11.1.3 Evaluating the validity of our results

#### 11.1.3.1 The Net Heat Plot

The `netmeta`

package has an in-built function, `netheat()`

, which allows us to produce a **net heat plot**. This net heat plot is very helpful to evaluate the inconsistency in our network model, and what contributes to it. We simply plug our `netmeta`

object into the `netheat()`

function to generate such a plot. The argument `nchar.trts`

can be used to shorten our comparison labels to a specific number of characters. Let us try the function out now.

`netheat(m.netmeta, nchar.trts = 4)`

The function generates a quadratic matrix in which each element in a row is compared to all other elements in the columns. It is important here to mention that the rows and columns signify specific **designs**, not all \(m\) treatment comparisons in our network. Thus, we also have rows and columns for the multiarm study `Willms1999`

, which had a design comparing **“Plac”**, **“Metf”** and **Acar**. Treatment comparison with only one kind of evidence (i.e. indirect or indirect evidence) are omitted in this plot, because we are interested in **cases of inconsistency** between direct and indirect evidence. Beyond that, the net heat plot has two important features (Schwarzer, Carpenter, and Rücker 2015):

**Grey boxes**. The grey boxes for each comparison of designs signify how**important**a treatment comparison is for the estimation of another treatment comparison. The bigger the box, the more important a comparison is. An easy way to analyse this is to gow through the rows of the plot one after another, and then to check for each row in which columns the grey boxes are the largest. A common finding is that the boxes are large in a row where the row comparison and the column comparison intersect, meaning that direct evidence was used. For example, a particularly big grey box can be seen at the intersection of the “Plac vs Rosi2 row and the”Plac vs Rosi" column.**Colored backgrounds**. The colored backgrounds, ranging from blue to red, signify the inconsistency of the comparison in a row attributable to the design in a column. Inconsistent fields are displayed in the upper-left corner in red. For example, in the row for “Rosi vs Sulf”, we see that the entry in column “Metf vs Sulf” is displayed in red. This means that the evidence contributed by “Metf vs Sulf” for the estimation of “Metf vs Sulf” is inconsistent with the other evidence.

We can now remind ourselves that these results are based on the **fixed-effects model**, which we used for our network analysis to begin with. From what we have seen so far, we can conclude that the fixed-effect model is not justified, because there is too much unexpected heterogeneity. We can thus check how the net heat plot changes when we assume a random-effects model by changing the `random`

argument of the `netheat`

function to `TRUE`

. We see that this results in a substantial decrease of inconsistency in our network.

```
netheat(m.netmeta,
nchar.trts = 4,
random = TRUE)
```

#### 11.1.3.2 Net splitting

Another method to check for consistency in our network is **net splitting**, also known as **node splitting**. This method splits our network estimates into the contribution of direct and indirect evidence, which allows us to control for inconsistency in specific comparisons in our network. To generate a net split and compare the results, we only have to plug our `netmeta`

object into the `netsplit`

function.

`netsplit(m.netmeta)`

```
## Back-calculation method to split direct and indirect evidence
##
## Fixed effect model:
##
## comparison k prop nma direct indir. Diff z p-value
## Acarbose vs Benfluorex 0 0 0.0778 . 0.0778 . . .
## Acarbose vs Metformin 1 0.10 0.2867 0.2000 0.2966 -0.0966 -0.26 0.7981
## Acarbose vs Miglitol 0 0 0.1166 . 0.1166 . . .
## Acarbose vs Pioglitazone 0 0 0.2391 . 0.2391 . . .
## Acarbose vs Placebo 2 0.63 -0.8274 -0.8172 -0.8446 0.0274 0.12 0.9030
## Acarbose vs Rosiglitazone 0 0 0.3745 . 0.3745 . . .
## Acarbose vs Sitagliptin 0 0 -0.2574 . -0.2574 . . .
## Acarbose vs Sulfonylurea 1 0.53 -0.3879 -0.4000 -0.3740 -0.0260 -0.11 0.9088
## Acarbose vs Vildagliptin 0 0 -0.1274 . -0.1274 . . .
## Benfluorex vs Metformin 0 0 0.2089 . 0.2089 . . .
## Benfluorex vs Miglitol 0 0 0.0387 . 0.0387 . . .
## Benfluorex vs Pioglitazone 0 0 0.1612 . 0.1612 . . .
## Benfluorex vs Placebo 2 1.00 -0.9052 -0.9052 . . . .
## Benfluorex vs Rosiglitazone 0 0 0.2967 . 0.2967 . . .
## Benfluorex vs Sitagliptin 0 0 -0.3352 . -0.3352 . . .
## Benfluorex vs Sulfonylurea 0 0 -0.4657 . -0.4657 . . .
## Benfluorex vs Vildagliptin 0 0 -0.2052 . -0.2052 . . .
## Metformin vs Miglitol 0 0 -0.1702 . -0.1702 . . .
## Metformin vs Pioglitazone 1 0.68 -0.0477 -0.1600 0.1866 -0.3466 -2.32 0.0201
## Metformin vs Placebo 4 0.58 -1.1141 -1.1523 -1.0608 -0.0915 -0.76 0.4489
## Metformin vs Rosiglitazone 2 0.18 0.0877 0.0731 0.0908 -0.0178 -0.10 0.9204
## Metformin vs Sitagliptin 0 0 -0.5441 . -0.5441 . . .
## Metformin vs Sulfonylurea 1 0.56 -0.6746 -0.3700 -1.0611 0.6911 3.88 0.0001
## Metformin vs Vildagliptin 0 0 -0.4141 . -0.4141 . . .
## Miglitol vs Pioglitazone 0 0 0.1225 . 0.1225 . . .
## Miglitol vs Placebo 3 1.00 -0.9439 -0.9439 . . . .
## Miglitol vs Rosiglitazone 0 0 0.2579 . 0.2579 . . .
## Miglitol vs Sitagliptin 0 0 -0.3739 . -0.3739 . . .
## Miglitol vs Sulfonylurea 0 0 -0.5044 . -0.5044 . . .
## Miglitol vs Vildagliptin 0 0 -0.2439 . -0.2439 . . .
## Pioglitazone vs Placebo 1 0.36 -1.0664 -1.3000 -0.9363 -0.3637 -2.30 0.0215
## Pioglitazone vs Rosiglitazone 1 0.20 0.1354 0.1000 0.1442 -0.0442 -0.22 0.8289
## Pioglitazone vs Sitagliptin 0 0 -0.4964 . -0.4964 . . .
## Pioglitazone vs Sulfonylurea 0 0 -0.6269 . -0.6269 . . .
## Pioglitazone vs Vildagliptin 0 0 -0.3664 . -0.3664 . . .
## Rosiglitazone vs Placebo 6 0.83 -1.2018 -1.1483 -1.4665 0.3182 2.50 0.0125
## Sitagliptin vs Placebo 1 1.00 -0.5700 -0.5700 . . . .
## Sulfonylurea vs Placebo 0 0 -0.4395 . -0.4395 . . .
## Vildagliptin vs Placebo 1 1.00 -0.7000 -0.7000 . . . .
## Rosiglitazone vs Sitagliptin 0 0 -0.6318 . -0.6318 . . .
## Rosiglitazone vs Sulfonylurea 1 0.41 -0.7623 -1.2000 -0.4575 -0.7425 -3.97 < 0.0001
## Rosiglitazone vs Vildagliptin 0 0 -0.5018 . -0.5018 . . .
## Sitagliptin vs Sulfonylurea 0 0 -0.1305 . -0.1305 . . .
## Sitagliptin vs Vildagliptin 0 0 0.1300 . 0.1300 . . .
## Sulfonylurea vs Vildagliptin 0 0 0.2605 . 0.2605 . . .
##
## Legend:
## comparison - Treatment comparison
## k - Number of studies providing direct evidence
## prop - Direct evidence proportion
## nma - Estimated treatment effect (MD) in network meta-analysis
## direct - Estimated treatment effect (MD) derived from direct evidence
## indir. - Estimated treatment effect (MD) derived from indirect evidence
## Diff - Difference between direct and indirect treatment estimates
## z - z-value of test for disagreement (direct versus indirect)
## p-value - p-value of test for disagreement (direct versus indirect)
```

The important information here is in the `p-value`

column. If the value in this column is \(p<0.05\), there is a **significant disagreement** (inconsistency) between the direct and indirect estimate. We see in the output that there are indeed a few comparisons showing significant inconsistency between direct and indirect evidence when using the fixed-effects model. A good way to visualize the netsplit results is through a **forest plot** displaying all comparisons for which there is both direct and indirect evidence.

`forest(netsplit(m.netmeta))`

#### 11.1.3.3 Comparison-adjusted Funnel Plots

Assessing the publication bias of a network meta-analysis in its aggregated form is difficult. Analyzing so-called **comparison-adjusted funnel plot** (see Chapter 9.1 for the general idea behind funnel plots) has been proposed to evaluate the risk of publication bias under specific circumstances (Salanti et al. 2014). Comparison-adjusted funnel plots allow to assess potential publication bias if we have an *a priori* hypothesis concerning which mechanism may underlie publication bias.

For example, we may have the hypothesis that publication bias exists in our data because studies which find that a **new form of treatment is superior** to an already known treatment have a **higher chance** of getting published, even if they have a small sample size, and thus a larger standard error of their effect size estimate. This is a pretty logical assumption, because it seems sensible that scientific journals may be particularly interested in publishing “novel” findings, such as that there might be a better treatment format for some kind of condition. The `funnel()`

function in `netmeta`

makes it easy to generate comparison-adjusted funnel plots to test such hypotheses. Here are the most important parameters of the function we may want to specify:

Code | Description |
---|---|

x | Here, we specify the name of our netmeta object. |

order | This paramter specifies the order of the hypothesized publication bias mechanism. We simply have to concenate (‘c()’) the names of all treatments in our network here, and order them according to our hypothesis. For example, if we want to test for publication bias favoring ‘newer’ treatments, we would insert the names of all treatments here, starting from the oldest treatment and ending with the most novel type of treatment or intervention. |

pch | This lets us specify the symbol used for the studies in the funnel plot. Setting this to ‘19’ gives us simple dots. |

col | Using this parameter, we can specifiy a concatenated selection of colors we want to use to distinguish the different comparison. The number of colors we specify here must be the same as the number of comparisons in our funnel plot. A complete list of possible color that R can use for plotting can be found here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf |

linreg | If we set this to TRUE, Egger’s test of the intercept for funnel plot asymmetry will be conducted, and its p-value will be displayed in the plot. |

xlim, ylim | These parameters let us specify the scope of the x and y axis |

studlab | If we set this to TRUE, study labels will be printed in the plot along with each point |

cex.studlab | This parameter lets us control the size of the study label. A value smaller than 1 means that the study labels will become smaller than by default |

```
funnel <- funnel(m.netmeta,
order = c("placebo", "sitagliptin", "pioglitazone", "vildagliptin", "rosiglitazone",
"acarbose", "metformin", "miglitol",
"benfluorex", "sulfonylurea"),
pch = 19,
col = c("blue", "red", "purple", "yellow", "grey", "green", "black", "brown",
"orange", "pink", "khaki", "plum", "aquamarine", "sandybrown", "coral"),
linreg = TRUE,
xlim = c(-1, 2),
ylim = c(0.5, 0),
studlab = TRUE,
cex.studlab = 0.7)
```

If our hypothesis is true, we would expect that studies with a **small sample**, and thus a higher **standard error** would by asymmetrically distributed around the zero line in our funnel plot. This is because we would expect that small studies comparing a novel treatment to an older one, yet finding that the new treatment is not better, are less likely to get published. In our plot, and from the p-value for Egger’s Test (\(p=0.93\)), however, we see that such funnel asymmetry is **not present**. Therefore, we cannot say that publication bias is present in our network because of “innovative” treatments with favorable trial effect being more likely to get published.

### 11.1.4 Summary

This has been a long chapter, and there have been many things we have covered until now. We have showed you the core assumptions behind the statistical modeling approach of the `netmeta()`

package, described how to fit your network meta-analysis model, visualize and interpret the results, and evaluate the validity of your findings. We would like to stress that it is crucially important in network meta-analysis not base your clinical decision-making on one single test or metric, but to **explore your model and its results with open eyes**, checking the patterns you find for their consistency and robustness, and to assess the uncertainty concerning some of its parameters.

In the next subchapter, we will address how to perform network meta-analysis using a bayesian hierarchical framework. Although the statistical procedures behind this approach vary considerably from what is described here at times, both methods essentially try to answer the same question. Process-wise, the analysis “pipeline” is also very similar most of the times. Time to go bayesian!

### References

Schwarzer, Guido, James R Carpenter, and Gerta Rücker. 2015. *Meta-Analysis with R*. Springer.

Rücker, Gerta, Guido Schwarzer, Ulrike Krahn, Jochem König, and Maintainer Guido Schwarzer. 2015. “Package ‘Netmeta’.” *Network Meta-Analysis Using Frequentist Methods (Version 0.7-0)*.

Rücker, Gerta. 2012. “Network Meta-Analysis, Electrical Networks and Graph Theory.” *Research Synthesis Methods* 3 (4). Wiley Online Library: 312–24.

Aronow, Peter M, and Benjamin T Miller. 2019. *Foundations of Agnostic Statistics*. Cambridge University Press.

Jackson, Dan, Ian R White, and Richard D Riley. 2013. “A Matrix-Based Method of Moments for Fitting the Multivariate Random Effects Model for Meta-Analysis and Meta-Regression.” *Biometrical Journal* 55 (2). Wiley Online Library: 231–45.

Senn, Stephen, Francois Gavini, David Magrez, and Andre Scheen. 2013. “Issues in Performing a Network Meta-Analysis.” *Statistical Methods in Medical Research* 22 (2). Sage Publications Sage UK: London, England: 169–89.

Higgins, JPT, D Jackson, JK Barrett, G Lu, AE Ades, and IR White. 2012. “Consistency and Inconsistency in Network Meta-Analysis: Concepts and Models for Multi-Arm Studies.” *Research Synthesis Methods* 3 (2). Wiley Online Library: 98–110.

König, Jochem, Ulrike Krahn, and Harald Binder. 2013. “Visualizing the Flow of Evidence in Network Meta-Analysis and Characterizing Mixed Treatment Comparisons.” *Statistics in Medicine* 32 (30). Wiley Online Library: 5414–29.

Rücker, Gerta, and Guido Schwarzer. 2015. “Ranking Treatments in Frequentist Network Meta-Analysis Works Without Resampling Methods.” *BMC Medical Research Methodology* 15 (1). BioMed Central: 58.

Mbuagbaw, L, B Rochwerg, R Jaeschke, D Heels-Andsell, W Alhazzani, L Thabane, and Gordon H Guyatt. 2017. “Approaches to Interpreting and Choosing the Best Treatments in Network Meta-Analyses.” *Systematic Reviews* 6 (1). BioMed Central: 79.

Salanti, Georgia, Cinzia Del Giovane, Anna Chaimani, Deborah M Caldwell, and Julian PT Higgins. 2014. “Evaluating the Quality of Evidence from a Network Meta-Analysis.” *PloS One* 9 (7). Public Library of Science: e99682.