Chapter 16 Power Analysis

A big asset (and probably one of the reasons why meta-analysis can be helpful in practical research) of meta-analyses is that they allow for large data to be combined to attain a more precise pooled effect. Lack of statistical power, however, may still play an important role, even in meta-analysis. This is particularly true for the clinical field, where it is often the case that only few studies are available for synthesis. The median number of included studies in the Cochrane Database for Systematic Reviews, for example, is six (Borenstein et al. 2011). This is even more grave once we consider that (1) many meta-analysts will also want to perform subgroup analyses and meta-regression, for which even more power is required, and (2) many meta-analyses have high heterogeneity, which reduces our precision, and thus our power.

Power is directly related to the Type II error level (\(\beta\)) we defined: \(Power = 1- \beta\). It is common practice to set our Type I error level (\(\alpha\)) to \(\alpha=0.05\), and thus to assume that the Type I error is four times as grave as the Type II error (i.e., falsely finding an effect while there is no effect in reality is four times as bad as not finding an effect while there is one in reality). The Type II error is therefore set at \(\beta=0.20\), and the power should thus be \(1-\beta=1-0.20=80\%\).

What assumptions should i make for my meta-analysis?

While researchers conducting primary studies can plan the size of their sample based on the effect size they want to find, the situation is a little different in meta-analysis, where we can only work with the published material. However, we have some control over the number of studies we want to include in our meta-analysis (e.g., through more leniently or strictly defined inclusion criteria). Therefore, we can change our power to some extent by including more or less studies into the meta-analysis. There are four things we have to make assumptions about when assessing the power of our meta-analysis a priori.

  • The number of included or includable studies \(k\)
  • The overall size of the studies we want to include (are the studies in the field rather small or large?)
  • The effect size we want to determine. This is particularly important, as we have to make assumptions about how big an effect size has to be to still be clinically meaningful. One study calculated that for interventions against depression, even effects as small as \(SMD=0.24\) may still be meaningful for patients (Cuijpers et al. 2014). If we want to study negative effects of an intervention (e.g., death or symptom deterioration), even very small effect sizes are extremely important and should be detected.
  • The heterogeneity of our studies’ effect sizes, as this also affects the precision of our meta-analysis, and thus its potential to find significant effects.

Besides these parameters, it is also important to think about other analyses, such as the subgroup analyses we want to conduct. How many studies are there for each subgroup, and what effects do we want to find in the subgroups? This is particularly important if we hypothesize that an intervention is not effective in a subgroup of patients, because we do not want to falsely find a treatment to be ineffective simply because the power was insufficient.

Post-hoc power tests: the abuse of power

Please note that power analyses should always be conducted a priori, meaning before you perform the meta-analysis.

Power analyses conducted after an analysis (“post hoc”) are fundamentally flawed (Hoenig and Heisey 2001), as they suffer from the so-called “power approach paradox”, in which an analysis yielding no significant effect is thought to show more evidence that the null hypothesis is true when the p-value is smaller, since then, the power to detect a true effect would be higher.


Borenstein, Michael, Larry V Hedges, Julian PT Higgins, and Hannah R Rothstein. 2011. Introduction to Meta-Analysis. John Wiley & Sons.

Cuijpers, Pim, Erick H Turner, Sander L Koole, Annemiek Van Dijke, and Filip Smit. 2014. “What Is the Threshold for a Clinically Relevant Effect? The Case of Major Depressive Disorders.” Depression and Anxiety 31 (5). Wiley Online Library: 374–78.

Hoenig, John M, and Dennis M Heisey. 2001. “The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis.” The American Statistician 55 (1). Taylor & Francis: 19–24.