## 6.2 Detecting outliers & influential cases

As mentioned before, **between-study heterogeneity** can also be caused by one more studies with **extreme effect sizes** which do not quite **fit in**. Especially when the **quality of these studies is low**, or the **studies are very small**, this may **distort** our pooled effect estimate, and it’s a good idea to have a look on the **pooled effect again once we remove such outliers from the analysis**.

On the other hand, we also want to know **if the pooled effect estimate we found is robust**, meaning that the effect does not depend heavily on **one single study**. Therefore, we also want to know **whether there are studies which heavily push the effect of our analysis into one direction**. Such studies are called **influential cases**, and we’ll devote some time to this topic in the second part of this chapter.

It should be noted that they are **many methods** to spot **outliers and influential cases**, and the methods described here are not comprehensive. If you want to read more about the underpinnings of this topic, we can recommend the paper by Wolfgang Viechtbauer and Mike Cheung (Viechtbauer and Cheung 2010).

### 6.2.1 Searching for extreme effect sizes (outliers)

A common method to detect outliers directly is to define a study as an outlier if the **study’s confidence interval does not overlap with the confidence interval of the pooled effect**. This means that we define a study as an outlier when its effect size estimate is **so extreme that we have high certainty that the study cannot be part of the “population” of effect sizes we actually pool in our meta-analysis** (i.e., the individual study differs significantly from the overall effect). To detect such outliers in our dataset, we can search for all studies:

- for which the
**upper bound of the 95% confidence interval is lower than the lower bound of the pooled effect confidence interval**(i.e., extremely small effects) - for which the
**lower bound of the 95% confidence interval is higher than the upper bound of the pooled effect confidence interval**(i.e., extremely large effects)

Here, I will use my `m.hksj`

meta-analysis output from Chapter 4.2.2 again. Let us see what the **upper and lower bound of my pooled effect confidence interval** is. As I performed a **random-effect meta-analysis in this example**, I will use the value stored under `$lower.random`

and `$upper.random`

. If you performed a **fixed-effect meta-analysis**, the objects would be `$lower.fixed`

and `$upper.fixed`

, respectively.

`m.hksj$lower.random`

`## [1] 0.389147`

`m.hksj$upper.random`

`## [1] 0.7979231`

Here we go. I now see that my **pooled effect confidence interval** stretches from \(g = 0.389\) to \(g = 0.798\). We can use these values to filter out outliers now.

To do this, we have prepared a **function** called `find.outliers`

for you. The function is part of the `dmetar`

package. If you have the package installed already, you have to load it into your library first.

`library(dmetar)`

If you do not want to use the `dmetar`

package, you can find the source code for this function here. In this case, *R* doesn’t know this function yet, so we have to let *R* learn it by **copying and pasting** the code **in its entirety** into the **console** in the bottom left pane of RStudio, and then hit **Enter ⏎**. The function then requires the `meta`

and `metafor`

package to work.

The only thing we have to provide the `find.outliers`

function with is the **meta-analysis object** that we want to check for outliers. In my case, this is `m.hksj`

.

`find.outliers(m.hksj)`

This is the output we get from the function:

```
## Identified outliers (random-effects model)
## ------------------------------------------
## "DanitzOrsillo", "Shapiro et al."
##
## Results with outliers removed
## -----------------------------
## SMD 95%-CI %W(random) exclude
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.0
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.9
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 0.0 *
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 10.4
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 9.1
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 7.0
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 6.0
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 6.4
## Hintz et al. 0.2840 [-0.0453; 0.6133] 8.1
## Kang et al. 1.2751 [ 0.6142; 1.9360] 3.5
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 7.0
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 5.8
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.4
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.1
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 3.3
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 0.0 *
## SongLindquist 0.6126 [ 0.1683; 1.0569] 5.9
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.3
##
## Number of studies combined: k = 16
##
## SMD 95%-CI t p-value
## Random effects model 0.4708 [0.3406; 0.6010] 7.71 < 0.0001
## Prediction interval [0.0426; 0.8989]
##
## Quantifying heterogeneity:
## tau^2 = 0.0361; H = 1.15 [1.00; 1.56]; I^2 = 24.8% [0.0%; 58.7%]
##
## Test of heterogeneity:
## Q d.f. p-value
## 19.95 15 0.1739
##
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
```

We see that the function has detected **two outliers**, “DanitzOrsillo” and “Shapiro”. Conveniently, the `find.outliers`

function has also automatically rerun our initial analysis, this time excluding the identified outliers. From the output, we see that the \(I^2\)-heterogeneity shrinks considerably when the two studies are excluded, from \(I^2 = 62\%\) to \(24.8\%\), which is not significant anymore (\(p = 0.1739\)).

We can also produce an updated forest plot in which the outliers are excluded by plugging the results of the `find.outliers`

function into the `forest`

function (this only works if you have already loaded the `meta`

and `metafor`

package from your library). The appearance of the resulting forest plot can be changed using arguments of the `forest`

function in `meta`

(these arguments are covered in detail in Chapter 5).

```
fo <- find.outliers(m.hksj)
forest(fo, col.predict = "blue")
```

In the resulting forest plot, the outlying studies are still displayed. However, their **weight** in the meta-analysis has been set to 0% (as shown in the column to the right), meaning that they are excluded from pooling.

### References

Viechtbauer, Wolfgang, and Mike W-L Cheung. 2010. “Outlier and Influence Diagnostics for Meta-Analysis.” *Research Synthesis Methods* 1 (2). Wiley Online Library: 112–25.