6.2 Detecting outliers & influential cases

As mentioned before, between-study heterogeneity can also be caused by one more studies with extreme effect sizes which don’t quite fit in. Especially when the quality of these studies is low, or the studies are very small, this may distort our pooled effect estimate, and it’s a good idea to have a look on the pooled effect again once we remove such outliers from the analysis.

On the other hand, we also want to know if the pooled effect estimate we found is robust, meaning that the effect does not depend heavily on one single study. Therefore, we also want to know whether there are studies which heavily push the effect of our analysis into some direction. Such studies are called influential cases, and we’ll devote some time to this topic in the second part of this chapter.

It should be noted that they are many methods to spot outliers and influential cases, and the methods described here are not comprehensive. If you want to read more about the underpinnings of this topic, we can recommend the paper by Wolfgang Viechtbauer and Mike Cheung (Viechtbauer and Cheung 2010).

6.2.1 Searching for extreme effect sizes (outliers)

A common method to detect outliers directly is to define a study as an outlier if the study’s confidence interval does not overlap with the confidence interval. This means that we define a study as an outlier when it’s effect size estimate is so extreme that we have high certainty that the study cannot be part of the “population” of effect size we determined when pooling our results (i.e., the individual study differs significantly from the overall effect).

To detect such outliers in our dataset, the filter function in the dplyr package we introduced in Chapter 3.3.3 comes in handy again.

Using this function, we can search for all studies:

  • for which the upper bound of the 95% confidence interval is lower than the lower bound of the pooled effect confidence interval (i.e., extremely small effects)
  • for which the lower bound of the 95% confidence interval is higher than the higher bound of the pooled effect confidence interval (i.e., extremely large effects)

Here, i’ll use my m.hksj meta-analysis output from Chapter 4.2.2 again. Let’s see what the upper and lower bound of my pooled effect confidence interval is. As i performed a random-effect meta-analysis in this example, i will use the value stored under $lower.random and $upper.random. If you performed a fixed-effect meta-analysis, the objects would be $lower.fixed and $upper.fixed, respectively.

m.hksj$lower.random
## [1] 0.389147
m.hksj$upper.random
## [1] 0.7979231

Here we go. I now see that my pooled effect confidence interval stretches from \(g = 0.389\) to \(g = 0.798\). We can use these values to filter out outliers now.

To filter out outliers automatically, we have prepared two functions for you, spot.outliers.random and spot.outliers.fixed. Both need the dplyr package (see Chapter 3.3.3) to function, so we need to need to have this package installed and loaded into our library.

library(dplyr)

The function we’ll use in the case of my m.hksj dataset is spot.outliers.random, because we conducted a random-effect meta-analysis to get this output object. R doesn’t know this function yet, so we have to let R learn it copying and pasting the code underneath in its entirety into the console on the bottom left pane of RStudio, and then hit Enter ⏎.

spot.outliers.random<-function(data){
data<-data
Author<-data$studlab
lowerci<-data$lower
upperci<-data$upper
m.outliers<-data.frame(Author,lowerci,upperci)
te.lower<-data$lower.random
te.upper<-data$upper.random
dplyr::filter(m.outliers,upperci < te.lower)
dplyr::filter(m.outliers,lowerci > te.upper)
}

Now, the function is ready to be used. The only thing we have to tell the spot.outliers.random function is the meta-analysis output that we want to check for outliers, which is defined by data. In my case, this is m.hksj.

spot.outliers.random(data=m.hksj)

This is the output we get from the function:

We see that the function has detected two outliers. Looking at the lowerci value, the lower bound of the two study’s confidence intervals, we see that both have extremely high positive effects, because the lower bounds are both much higher than the higher bound of the confidence interval of our pooled effect, which was \(g = 0.798\).

Thus, we can conduct a sensitivity analysis in which we exclude these two outliers. We can do this with the update.meta function in meta. This creates an update of our meta-analysis output m.hksj without the outliers.

m.hksj.outliers<-update.meta(m.hksj,
                             subset = Author != c("DanitzOrsillo",
                                                  "Shapiro et al."))
m.hksj.outliers
##                           SMD            95%-CI %W(random)
## Call et al.            0.7091 [ 0.1979; 1.2203]        5.0
## Cavanagh et al.        0.3549 [-0.0300; 0.7397]        6.9
## de Vibe et al.         0.1825 [-0.0484; 0.4133]       10.4
## Frazier et al.         0.4219 [ 0.1380; 0.7057]        9.1
## Frogeli et al.         0.6300 [ 0.2458; 1.0142]        7.0
## Gallego et al.         0.7249 [ 0.2846; 1.1652]        6.0
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412]        6.4
## Hintz et al.           0.2840 [-0.0453; 0.6133]        8.1
## Kang et al.            1.2751 [ 0.6142; 1.9360]        3.5
## Kuhlmann et al.        0.1036 [-0.2781; 0.4853]        7.0
## Lever Taylor et al.    0.3884 [-0.0639; 0.8407]        5.8
## Phang et al.           0.5407 [ 0.0619; 1.0196]        5.4
## Rasanen et al.         0.4262 [-0.0794; 0.9317]        5.1
## Ratanasiripong         0.5154 [-0.1731; 1.2039]        3.3
## SongLindquist          0.6126 [ 0.1683; 1.0569]        5.9
## Warnecke et al.        0.6000 [ 0.1120; 1.0880]        5.3
## 
## Number of studies combined: k = 16
## 
##                         SMD           95%-CI    t  p-value
## Random effects model 0.4708 [0.3406; 0.6010] 7.71 < 0.0001
## Prediction interval         [0.0426; 0.8989]              
## 
## Quantifying heterogeneity:
## tau^2 = 0.0361; H = 1.15 [1.00; 1.56]; I^2 = 24.8% [0.0%; 58.7%]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  19.95   15  0.1739
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model

The entire procedure works the same if you conducted a fixed-effect meta-analysis. However, you need to copy and paste the code for the spot.outliers.fixed function then, which can be found below.

spot.outliers.fixed<-function(data){
data<-data
Author<-data$studlab
lowerci<-data$lower
upperci<-data$upper
m.outliers<-data.frame(Author,lowerci,upperci)
te.lower<-data$lower.fixed
te.upper<-data$upper.fixed
dplyr::filter(m.outliers,upperci < te.lower)
dplyr::filter(m.outliers,lowerci > te.upper)
}




References

Viechtbauer, Wolfgang, and Mike W-L Cheung. 2010. “Outlier and Influence Diagnostics for Meta-Analysis.” Research Synthesis Methods 1 (2). Wiley Online Library: 112–25.

banner