6.2 Detecting outliers & influential cases

As mentioned before, between-study heterogeneity can also be caused by one more studies with extreme effect sizes which do not quite fit in. Especially when the quality of these studies is low, or the studies are very small, this may distort our pooled effect estimate, and it’s a good idea to have a look on the pooled effect again once we remove such outliers from the analysis.

On the other hand, we also want to know if the pooled effect estimate we found is robust, meaning that the effect does not depend heavily on one single study. Therefore, we also want to know whether there are studies which heavily push the effect of our analysis into one direction. Such studies are called influential cases, and we’ll devote some time to this topic in the second part of this chapter.

It should be noted that they are many methods to spot outliers and influential cases, and the methods described here are not comprehensive. If you want to read more about the underpinnings of this topic, we can recommend the paper by Wolfgang Viechtbauer and Mike Cheung (Viechtbauer and Cheung 2010).

6.2.1 Searching for extreme effect sizes (outliers)

A common method to detect outliers directly is to define a study as an outlier if the study’s confidence interval does not overlap with the confidence interval of the pooled effect. This means that we define a study as an outlier when its effect size estimate is so extreme that we have high certainty that the study cannot be part of the “population” of effect sizes we actually pool in our meta-analysis (i.e., the individual study differs significantly from the overall effect). To detect such outliers in our dataset, we can search for all studies:

  • for which the upper bound of the 95% confidence interval is lower than the lower bound of the pooled effect confidence interval (i.e., extremely small effects)
  • for which the lower bound of the 95% confidence interval is higher than the upper bound of the pooled effect confidence interval (i.e., extremely large effects)

Here, I will use my m.hksj meta-analysis output from Chapter 4.2.2 again. Let us see what the upper and lower bound of my pooled effect confidence interval is. As I performed a random-effect meta-analysis in this example, I will use the value stored under $lower.random and $upper.random. If you performed a fixed-effect meta-analysis, the objects would be $lower.fixed and $upper.fixed, respectively.

m.hksj$lower.random
## [1] 0.389147
m.hksj$upper.random
## [1] 0.7979231

Here we go. I now see that my pooled effect confidence interval stretches from \(g = 0.389\) to \(g = 0.798\). We can use these values to filter out outliers now.

To do this, we have prepared a function called find.outliers for you. The function is part of the dmetar package. If you have the package installed already, you have to load it into your library first.

library(dmetar)

If you do not want to use the dmetar package, you can find the source code for this function here. In this case, R doesn’t know this function yet, so we have to let R learn it by copying and pasting the code in its entirety into the console in the bottom left pane of RStudio, and then hit Enter ⏎. The function then requires the meta and metafor package to work.

The only thing we have to provide the find.outliers function with is the meta-analysis object that we want to check for outliers. In my case, this is m.hksj.

find.outliers(m.hksj)

This is the output we get from the function:

## Identified outliers (random-effects model) 
## ------------------------------------------ 
## "DanitzOrsillo", "Shapiro et al." 
##  
## Results with outliers removed 
## ----------------------------- 
##                           SMD            95%-CI %W(random) exclude
## Call et al.            0.7091 [ 0.1979; 1.2203]        5.0        
## Cavanagh et al.        0.3549 [-0.0300; 0.7397]        6.9        
## DanitzOrsillo          1.7912 [ 1.1139; 2.4685]        0.0       *
## de Vibe et al.         0.1825 [-0.0484; 0.4133]       10.4        
## Frazier et al.         0.4219 [ 0.1380; 0.7057]        9.1        
## Frogeli et al.         0.6300 [ 0.2458; 1.0142]        7.0        
## Gallego et al.         0.7249 [ 0.2846; 1.1652]        6.0        
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412]        6.4        
## Hintz et al.           0.2840 [-0.0453; 0.6133]        8.1        
## Kang et al.            1.2751 [ 0.6142; 1.9360]        3.5        
## Kuhlmann et al.        0.1036 [-0.2781; 0.4853]        7.0        
## Lever Taylor et al.    0.3884 [-0.0639; 0.8407]        5.8        
## Phang et al.           0.5407 [ 0.0619; 1.0196]        5.4        
## Rasanen et al.         0.4262 [-0.0794; 0.9317]        5.1        
## Ratanasiripong         0.5154 [-0.1731; 1.2039]        3.3        
## Shapiro et al.         1.4797 [ 0.8618; 2.0977]        0.0       *
## SongLindquist          0.6126 [ 0.1683; 1.0569]        5.9        
## Warnecke et al.        0.6000 [ 0.1120; 1.0880]        5.3        
## 
## Number of studies combined: k = 16
## 
##                         SMD           95%-CI    t  p-value
## Random effects model 0.4708 [0.3406; 0.6010] 7.71 < 0.0001
## Prediction interval         [0.0426; 0.8989]              
## 
## Quantifying heterogeneity:
## tau^2 = 0.0361; H = 1.15 [1.00; 1.56]; I^2 = 24.8% [0.0%; 58.7%]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  19.95   15  0.1739
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model

We see that the function has detected two outliers, “DanitzOrsillo” and “Shapiro”. Conveniently, the find.outliers function has also automatically rerun our initial analysis, this time excluding the identified outliers. From the output, we see that the \(I^2\)-heterogeneity shrinks considerably when the two studies are excluded, from \(I^2 = 62\%\) to \(24.8\%\), which is not significant anymore (\(p = 0.1739\)).

We can also produce an updated forest plot in which the outliers are excluded by plugging the results of the find.outliers function into the forest function (this only works if you have already loaded the meta and metafor package from your library). The appearance of the resulting forest plot can be changed using arguments of the forest function in meta (these arguments are covered in detail in Chapter 5).

fo <- find.outliers(m.hksj)
forest(fo, col.predict = "blue")

In the resulting forest plot, the outlying studies are still displayed. However, their weight in the meta-analysis has been set to 0% (as shown in the column to the right), meaning that they are excluded from pooling.




References

Viechtbauer, Wolfgang, and Mike W-L Cheung. 2010. “Outlier and Influence Diagnostics for Meta-Analysis.” Research Synthesis Methods 1 (2). Wiley Online Library: 112–25.

banner