6.4 GOSH Plot Analysis

In the previous subchapter, we explored the robustness of our meta-analysis results using influence analyses and the leave-one-out method. An even more sophisticated way to explore the patterns of effect sizes and heterogeneity in our data are so-called Graphic Display of Heterogeneity (GOSH) plots (Olkin, Dahabreh, and Trikalinos 2012). For those plots, we fit the same meta-analysis model to all possible subsets of our included studies. In constrast to the leave-one-out method, we therefore not only fit \(K-1\) models, but all \(2^{k-1}\) possible study combinations. This means that creating GOSH plots can become quite computationally intensive when the number of included studies is large. The R implementation we cover here therefore only fits a maximum of 1 million randomly selected models.

Once the models are calculated, we can plot them, displaying the pooled effect size on the x-axis and the between-study heterogeneity at the y-axis. This allows us to look for specific patterns, for example subclusters with different effect sizes. This would indicate that there is in fact more than one “population” of effect sizes in our data, warranting a subgroup analysis. If the effect sizes in our sample are homogeneous, the GOSH plot should form a symmetric distribution with one peak. To generate GOSH plots, we can use the gosh function in the metafor package. If you have not installed the package yet, install it and then load it from your library.


I will generate plots for my m.hksj meta-analysis object. Before we can generate the plot, we have to “transform” this object created by the meta package into a metafor meta-analysis object which can be used for the gosh function. To do this, i can simply use the data stored in m.hksj. The function to perform a meta-analysis in metafor is called rma(). I only have to provide the function with the effect size (TE), Standard Error (seTE), and between-study heterogeneity estimator (method.tau) stored in m.hksj. Because I want to replicate the results completely, I also use Knapp-Hartung adjustments in rma() by setting test = "knha".

m.rma <- rma(yi = m.hksj$TE, 
             sei = m.hksj$seTE,
             method = m.hksj$method.tau,
             test = "knha")

Please note that if you use the fixed-effect model, you have to set method = "FE".

We can then use this object to generate the GOSH plot object. Depending on the number of studies in your analysis, this can take some time, up to a few hours. I save it as dat.gosh.

dat.gosh <- gosh(m.rma)

I can then display the the GOSH plot by pluggin the dat.gosh object into the plot() function.

plot(dat.gosh, alpha= 0.1, col = "blue")

Interestingly, we see that there are two patterns in our data: one with lower effect sizes and lower heterogeneity, and one with higher effects, but considerable between-study heterogeneity: it seems like there are two subclusters in our data.

Now that we know the effect size-heterogeneity patterns in our data, the really important question of course is: which studies cause those patterns, and may thus belong to which subcluster? To answer this question we have developed the gosh.diagnostics function. This function uses three clustering (also known as supervised machine learning) algorithms to detect clusters in the GOSH plot data and determine which studies contribute the most to them automatically. The function is part of the dmetar package. If you have the package installed already, you have to load it into your library first.


If you don’t want to use the dmetar package, you can find the source code for this function here. In this case, R doesn’t know this function yet, so we have to let R learn it by copying and pasting the code in its entirety into the console in the bottom left pane of RStudio, and then hit Enter ⏎. The function then requires the dplyr, cluster, mvtnorm, factoextra, fpc, cowplot, reshape2, and flexmix package installed and loaded in your library to function properly. We will use the default settings for the function here, but many more are available (see documentation). We simply plug the dat.gosh object into the function. Again, as the there is a lot of data to process, the function may take need some time before it is finished.

 Perform Clustering... 
[==============================================================================================] DONE  

 Number of k-means clusters used: 3 

 Number of DBSCAN clusters detected: 3 

 Number of GMM clusters detected: 2 

 Identification of potential outliers 
 Studies identified as potential outliers: 

 - K-means: 3 16 

 - DBSCAN: 3 10 16 

 - Gaussian Mixture Model: 3 16 

We see that the three algorithms, \(k\)-means, DBSCAN and the Gaussian Mixture Model, have detected three studies which might potentially contribute to the cluster imbalance: study 3, study 10, and study 16. These studies seem to nearly fully account for the second high-effect size, high-heterogeneity cluster we found.

Let us see what happens if we rerun the meta-analysis, this time removing these three studies.

        comb.fixed = FALSE,
        comb.random = TRUE,
        method.tau = "SJ",
        hakn = TRUE,
        exclude = c(3, 10, 16))
##                           SMD            95%-CI %W(random) exclude
## Call et al.            0.7091 [ 0.1979; 1.2203]        4.6        
## Cavanagh et al.        0.3549 [-0.0300; 0.7397]        7.1        
## DanitzOrsillo          1.7912 [ 1.1139; 2.4685]        0.0       *
## de Vibe et al.         0.1825 [-0.0484; 0.4133]       13.2        
## Frazier et al.         0.4219 [ 0.1380; 0.7057]       10.6        
## Frogeli et al.         0.6300 [ 0.2458; 1.0142]        7.2        
## Gallego et al.         0.7249 [ 0.2846; 1.1652]        5.8        
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412]        6.5        
## Hintz et al.           0.2840 [-0.0453; 0.6133]        8.8        
## Kang et al.            1.2751 [ 0.6142; 1.9360]        0.0       *
## Kuhlmann et al.        0.1036 [-0.2781; 0.4853]        7.2        
## Lever Taylor et al.    0.3884 [-0.0639; 0.8407]        5.6        
## Phang et al.           0.5407 [ 0.0619; 1.0196]        5.1        
## Rasanen et al.         0.4262 [-0.0794; 0.9317]        4.7        
## Ratanasiripong         0.5154 [-0.1731; 1.2039]        2.8        
## Shapiro et al.         1.4797 [ 0.8618; 2.0977]        0.0       *
## SongLindquist          0.6126 [ 0.1683; 1.0569]        5.8        
## Warnecke et al.        0.6000 [ 0.1120; 1.0880]        5.0        
## Number of studies combined: k = 15
##                         SMD           95%-CI    t  p-value
## Random effects model 0.4299 [0.3234; 0.5364] 8.66 < 0.0001
## Prediction interval         [0.1428; 0.7169]              
## Quantifying heterogeneity:
## tau^2 = 0.0152; H = 1.00 [1.00; 1.44]; I^2 = 0.0% [0.0%; 51.8%]
## Test of heterogeneity:
##      Q d.f. p-value
##  13.48   14  0.4890
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model

We see that the between-study heterogeneity has dropped to \(0\%\), indicating that the studies we are now analyzing stem from one homogeneous (sub)population, the lower cluster in our GOSH plot.


Olkin, Ingram, Issa J Dahabreh, and Thomas A Trikalinos. 2012. “GOSH–a Graphical Display of Study Heterogeneity.” Research Synthesis Methods 3 (3). Wiley Online Library: 214–23.