In the last chapter, we showed how you can apply Egger’s test of the intercept, Duval & Tweedie’s trim and fill procedure, and inspect Funnel plots in R.
As we have mentioned before, recent research has shown that the assumptions of the small-effect study methods may be inaccurate in many cases. The Duval & Tweedie trim-and-fill procedure in particular has been shown to be prone to providing inaccurate effect size estimates (Simonsohn, Nelson, and Simmons 2014b).
\(P\)-curve Analysis has been proposed as an alternative way to assess publication bias and estimate the true effect behind our collected data. \(P\)-Curve assumes that publication bias is not primarily generated because researchers do not publish non-significant results, but because the “play” around with their data (e.g., selectively removing outliers, choosing different outcomes, controlling for different variables) until a non-significant finding becomes significant. This (bad) practice is called \(p\)-hacking, and has been shown to be very frequent among researchers (Head et al. 2015).
The idea behind \(P\)-Curve
9.2.1 Performing a p-curve analysis
To conduct a \(p\)-curve analysis, you can use the
pcurve function we prepared for you. This function is part if the
dmetar package. If you have the package installed already, you have to load it into your library first.
If you don’t want to use the
dmetar package, you can find the source code for this function here. In this case, R doesn’t know this function yet, so we have to let R learn it by copying and pasting the code in its entirety into the console on the bottom left pane of RStudio, and then hit Enter ⏎. The function requires the
poibin package to work.
For this function, the following parameters need to be specified:
|x||The meta-analysis results object generated by meta functions.|
|effect.estimation||Logical. Should the true effect size underlying the p-curve be estimated? If set to TRUE, a vector containing the total sample size for each study must be provided for N. FALSE by default.|
|N||A numeric vector of same length as the number of effect sizes included in x specifiying the total sample size N corresponding to each effect. Only needed if effect.estimation = TRUE.|
|dmin||If effect.estimation = TRUE: lower limit for the effect size (d) space in which the true effect size should be searched. Must be greater or equal to 0. Default is 0.|
|dmax||If effect.estimation = TRUE: upper limit for the effect size (d) space in which the true effect size should be searched. Must be greater than 0. Default is 1.|
First, let’s use the
pcurve function with
effect.estimation set to
FALSE. As this is the default, we only have to plug the
m.hksj meta-analysis object into the function to generate the \(p\)-curve.
## P-curve analysis ## ----------------------- ## - Total number of provided studies: k = 18 ## - Total number of p<0.05 studies included into the analysis: k = 11 (61.11%) ## - Total number of studies with p<0.025: k = 10 (55.56%) ## ## Results ## ----------------------- ## pBinomial zFull pFull zHalf pHalf ## Right-skewness test 0.006 -5.943 0.000 -4.982 0 ## Flatness test 0.975 3.260 0.999 5.158 1 ## Note: p-values of 0 or 1 correspond to p<0.001 and p>0.999, respectively. ## Power Estimate: 84% (62.7%-94.6%) ## ## Evidential value ## ----------------------- ## - Evidential value present: yes ## - Evidential value absent/inadequate: no
The function produces a large output, so let us go through it one by one:
- P-Curve plot. The figure shows the \(p\)-curve for your results (in blue). On the bottom, you can also find the number of effect sizes with \(p<0.05\) which were included in the analysis. Results of the Right-Skewness and Flatness test are also displayed (see Results).
- P-curve analysis. This section of the output provides general details about the studies used for the \(p\)-curve analysis, such as the number of studies in the meta-analysis (in total), the number of significant effect sizes used for the \(p\)-curve analysis, and number of studies with effect sizes for which \(p<0.025\) (the so-called “half-curve”).
- Results. This section displays results of the Right-Skewness test and the Flatness test. The Right-Skewness tests analyzes if the \(p\)-curve resulting from your data is significantly right-skewed, which would indicate that there is a “true” effect behind your data. The flatness test analyzes if the \(p\)-curve is flat, which could indicate that the power is insufficient, or that there is no “true” effect behind your data. For both tests, result are reported for the full \(p\)-curve (all values for which \(p<0.05\)) and for the half \(p\)-curve (all values for which \(p<0.025\)).
- Power Estimate. This line displays the estimated power of the studies in your analysis, and the confidence interval.
- Evidential value. In terms of interpretation, this section is the most important one of the output. It shows if \(P\)-curve estimates that evidential value (a “true” effect) is present in the analysis or not. This interpretation is done automatically based on the values of the Flatness and Right-Skewness test (you can read the documentation of the function for more information on how this is done). There are two types of information provided: (\(i\)) if evidential value is present, or (\(ii\)) if it is absent or inadequate. This may look a little peculiar at first, because we would expect a simple yes/no interpretation concerning the evidential value of our analysis. However, it is possible that both
Evidential value presentand
Evidential value absent/inadequateresult in a
nodecision. This basically means that while the presence of evidential value could not be ascertained, it could also not be verified that evidential value is indeed absent or inadequate (for example because a very small effect exists).
m.hksj object, we see that 11 studies were included into the analysis, of which 10 had a \(p\)-value lower than 0.025. We also see that the Power of the analysis was 84% (95%CI: 62.7%-94.6%).
We are provided with the interpretation that evidential value is present, and that evidential value is not absent or inadequate. This means that \(P\)-curve estimates that there is a “true” effect size behind our findings, and that the results are not the product of publication bias and \(p\)-hacking alone.
9.2.2 Estimating the “true” effect
pcurve function also allows you to estimate \(P\)-curve’s estimate of the “true” effect size underlying your data (much like the Duval & Tweedie trim-and-fill procedure we described before). However, there is one important information we need to do this: we need to know the total sample size of each study. Thankfully, this information is usually reported in most research publications.
Let’s assume i have stored a variable called
N.m.hksj in R, which contains the total sample size for each study contained in
m.hksj (in the same order as the studies in
m.hksj). Let’s have a look at
##  105 161 60 37 141 82 97 61 200 79 124 25 166 59 201 95 166 ##  144
With this information, we can extend the
pcurve call from before with some additional arguments. First, we have to set
TRUE to estimate the effect size. Second, we have to provide the function with the study sample sizes
N.m.hksj. Lastly, we can specify the range of effect sizes in which the function should search for the true effect (expressed as Cohen’s \(d\)) through
dmax. We will search for the effect between \(d=0.0\) and \(d=1.0\). The function returns the same output as before, and one additional plot:
pcurve(m.hksj, effect.estimation = TRUE, N = N.m.hksj, dmin = 0, dmax = 1)
As can be seen in the plot, the function provides an effect estimate of \(d=0.48\). This effect mirrors the one we found for
m.hksj when using the fixed-effect model, while the effect size for the random-effects model (\(g=0.59\)) was somewhat higher.
It should be noted that this chapter should only be seen as an introduction into \(P\)-curve, and should not be seen as comprehensive. Simonsohn et al. (Simonsohn, Simmons, and Nelson 2015) also stress that \(P\)-Curve should only be used for outcome data which was actually of interest for the authors of the specific article, because those are the one’s likely to get \(p\)-hacked. They also ask meta-researchers to provide a detailed table in which the reported results of each outcome data used in the \(P\)-curve is documented (a guide can be found here).
It has also been shown that \(P\)-Curve’s effect estimate are not robust when the heterogeneity of a meta-analyis is high (\(I^2\) > 50%). Van Aert et al. (Aert, Wicherts, and Assen 2016) propose not to determine the “true” effect using \(P\)-Curve when heterogeneity is high (defined as \(I^2\) > 50%). The
pcurve function therefore automatically prints a warning message at the end of the output if a true effect is estimated and heterogeneity is considerable. A poosible solution for this problem might be to reduce the overall heterogeneity using outlier removal, or to \(p\)-curve results in more homogeneous subgroups.
Simonsohn, Uri, Leif D Nelson, and Joseph P Simmons. 2014b. “P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results.” Perspectives on Psychological Science 9 (6). Sage Publications Sage CA: Los Angeles, CA: 666–81.
Head, Megan L, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” PLoS Biology 13 (3). Public Library of Science: e1002106.
Simonsohn, Uri, Joseph P Simmons, and Leif D Nelson. 2015. “Better P-Curves: Making P-Curve Analysis More Robust to Errors, Fraud, and Ambitious P-Hacking, a Reply to Ulrich and Miller (2015).” American Psychological Association.
Aert, Robbie CM van, Jelte M Wicherts, and Marcel ALM van Assen. 2016. “Conducting Meta-Analyses Based on P Values: Reservations and Recommendations for Applying P-Uniform and P-Curve.” Perspectives on Psychological Science 11 (5). Sage Publications Sage CA: Los Angeles, CA: 713–29.