## 6.2 Detecting outliers & influential cases

As mentioned before, **between-study heterogeneity** can also be caused by one more studies with **extreme effect sizes** which don’t quite **fit in**. Especially when the **quality of these studies is low**, or the **studies are very small**, this may **distort** our pooled effect estimate, and it’s a good idea to have a look on the **pooled effect again once we remove such outliers from the analysis**.

On the other hand, we also want to know **if the pooled effect estimate we found is robust**, meaning that the effect does not depend heavily on **one single study**. Therefore, we also want to know **whether there are studies which heavily push the effect of our analysis into some direction**. Such studies are called **influential cases**, and we’ll devote some time to this topic in the second part of this chapter.

It should be noted that they are **many methods** to spot **outliers and influential cases**, and the methods described here are not comprehensive. If you want to read more about the underpinnings of this topic, we can recommend the paper by Wolfgang Viechtbauer and Mike Cheung (Viechtbauer and Cheung 2010).

### 6.2.1 Searching for extreme effect sizes (outliers)

A common method to detect outliers directly is to define a study as an outlier if the **study’s confidence interval does not overlap with the confidence interval**. This means that we define a study as an outlier when it’s effect size estimate is **so extreme that we have high certainty that the study cannot be part of the “population” of effect size we determined when pooling our results** (i.e., the individual study differs significantly from the overall effect).

To detect such outliers in our dataset, the `filter`

function in the `dplyr`

package we introduced in Chapter 3.3.3 comes in handy again.

Using this function, we can search for all studies:

- for which the
**upper bound of the 95% confidence interval is lower than the lower bound of the pooled effect confidence interval**(i.e., extremely small effects) - for which the
**lower bound of the 95% confidence interval is higher than the higher bound of the pooled effect confidence interval**(i.e., extremely large effects)

Here, i’ll use my `m.hksj`

meta-analysis output from Chapter 4.2.2 again. Let’s see what the **upper and lower bound of my pooled effect confidence interval** is. As i performed a **random-effect meta-analysis in this example**, i will use the value stored under `$lower.random`

and `$upper.random`

. If you performed a **fixed-effect meta-analysis**, the objects would be `$lower.fixed`

and `$upper.fixed`

, respectively.

`m.hksj$lower.random`

`## [1] 0.389147`

`m.hksj$upper.random`

`## [1] 0.7979231`

Here we go. I now see that my **pooled effect confidence interval** stretches from \(g = 0.389\) to \(g = 0.798\). We can use these values to filter out outliers now.

To filter out outliers **automatically**, we have prepared two **functions** for you, `spot.outliers.random`

and `spot.outliers.fixed`

. Both need the `dplyr`

package (see Chapter 3.3.3) to function, so we need to need to have this package **installed** and **loaded into our library**.

`library(dplyr)`

The function we’ll use in the case of my `m.hksj`

dataset is `spot.outliers.random`

, because we conducted a **random-effect meta-analysis to get this output object**. R doesn’t know this function yet, so we have to let R learn it copying and pasting the code underneath **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**.

```
spot.outliers.random<-function(data){
data<-data
Author<-data$studlab
lowerci<-data$lower
upperci<-data$upper
m.outliers<-data.frame(Author,lowerci,upperci)
te.lower<-data$lower.random
te.upper<-data$upper.random
dplyr::filter(m.outliers,upperci < te.lower)
dplyr::filter(m.outliers,lowerci > te.upper)
}
```

Now, the function is ready to be used. The only thing we have to tell the `spot.outliers.random`

function is the **meta-analysis output** that we want to check for outliers, which is defined by `data`

. In my case, this is `m.hksj`

.

`spot.outliers.random(data=m.hksj)`

This is the output we get from the function:

We see that the function has detected **two outliers**. Looking at the `lowerci`

value, the lower bound of the two study’s confidence intervals, we see that both have extremely high positive effects, because the lower bounds are both much higher than the **higher bound of the confidence interval of our pooled effect**, which was \(g = 0.798\).

Thus, we can conduct a **sensitivity analysis** in which we **exclude these two outliers**. We can do this with the `update.meta`

function in `meta`

. This creates an update of our meta-analysis output `m.hksj`

without the outliers.

```
m.hksj.outliers<-update.meta(m.hksj,
subset = Author != c("DanitzOrsillo",
"Shapiro et al."))
m.hksj.outliers
```

```
## SMD 95%-CI %W(random)
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.0
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.9
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 10.4
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 9.1
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 7.0
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 6.0
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 6.4
## Hintz et al. 0.2840 [-0.0453; 0.6133] 8.1
## Kang et al. 1.2751 [ 0.6142; 1.9360] 3.5
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 7.0
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 5.8
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.4
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.1
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 3.3
## SongLindquist 0.6126 [ 0.1683; 1.0569] 5.9
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.3
##
## Number of studies combined: k = 16
##
## SMD 95%-CI t p-value
## Random effects model 0.4708 [0.3406; 0.6010] 7.71 < 0.0001
## Prediction interval [0.0426; 0.8989]
##
## Quantifying heterogeneity:
## tau^2 = 0.0361; H = 1.15 [1.00; 1.56]; I^2 = 24.8% [0.0%; 58.7%]
##
## Test of heterogeneity:
## Q d.f. p-value
## 19.95 15 0.1739
##
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
```

The entire procedure works the same if you **conducted a fixed-effect meta-analysis**. However, you need to copy and paste the code for the `spot.outliers.fixed`

function then, which can be found **below**.

```
spot.outliers.fixed<-function(data){
data<-data
Author<-data$studlab
lowerci<-data$lower
upperci<-data$upper
m.outliers<-data.frame(Author,lowerci,upperci)
te.lower<-data$lower.fixed
te.upper<-data$upper.fixed
dplyr::filter(m.outliers,upperci < te.lower)
dplyr::filter(m.outliers,lowerci > te.upper)
}
```

### References

Viechtbauer, Wolfgang, and Mike W-L Cheung. 2010. “Outlier and Influence Diagnostics for Meta-Analysis.” *Research Synthesis Methods* 1 (2). Wiley Online Library: 112–25.