2 Choosing an effect size

The effect size is the back bone of meta-analysis as all other aspects of this book will branch out from initial effect size calculations that we make.

When we talk about effect sizes we make the distinction between measurement effect sizes that are calculated based on means and standard deviations for a pair of treatment and control sites (there will be many of these in a meta-analysis) and a summary effect size (there will be one of these in the typical meta-analysis). The summary effect size is the statistical aggregation of many measurement effect sizes.

There are many helpful texts and reviews that lay out the benefits and costs to using the many effect sizes that can be found in meta-analytic research. One of the most popular packages for ecological meta-analysis, metafor allows for calculation of at least 15 effect sizes just by changing a few letters of you R code. Please see these resources for more information on effect size options (Koricheva, Gurevitch, and Mengersen 2013, @schwarzer_meta–analysis_2015).

Within the metafor package the escalc function handles calculating the measurement-level effect size calculations. Essentially the function takes in pairs data from both treatment (abbreviated by a “t”) and control sites (abbreviated with a “c”). The paired data typically represents sample means (\(\bar{x_t}\) and \(\bar{x_c}\)), standard deviations (\(s_t\) and \(s_c\)), and sample sizes (\(n_t\) and \(n_c\)). Using these paired data, escalc will output a single effect size and measurement of variance for each row (i.e., paired measurement in your database).

Here’s how the paired data appear in our example database. Take a look at the table below and scroll to the right. You’ll find columns that contain all of the quantitative information that is necessary for calculating an effect size for each pair of measurements or rows in the database. The column mean_contol has the mean species richness at uninvaded sites, the next column, sample_size_control provides the number of control study sites, however the authors defined “study site” within their article. You will be able to locate \(\bar{x}\), \(s\), and \(n\) for every invasive species included in the database. When we are missing some necessary metric, this is most often standard deviation, we can use methods to impute missing values, however, these will be covered in a later chapter.

require(DT) #This package allows for scrollable tables in R Markdown
## Loading required package: DT
datatable(meta_analysis_data, fillContainer = TRUE)

Now, let’s see how we can use these data and the escalc function in the metafor package to calculate measurement-level effect sizes.

2.1 Standardized Mean Difference (aka Hedges’ g)

The Standardized Mean Difference (SMD) was one of the first effect sizes used in published meta-analyses (Hedges and Olkin 1985) and it remains widely used in both ecological meta-research (Crystal-Ornelas et al., n.d.; Závorka, Buoro, and Cucherousset 2018; Doherty et al. 2020). There are many ways to calculate the SMD for a dataset, the one we will use here, and the default for the metafor package is called Hedge’s g. Hedges’ g is a useful effect size in ecological meta-analysis because it statistically corrects for variance that may be introduced when sample sizes are small (Hedges 1981).

Another benefit of using Hedges’ g as an effect size is that by having a “standardized” effect we can synthesize data that were measured on a different scales. The term scales here depends on your research question and systems. For example, we would likely not want to directly pool the raw data on species richness from many different articles because the articles likely vary in their exact definition of how they quantified species richness (i.e., did authors measure richness of all native tree species or richness of native tree and herbaceous plants?). The Hedges’ g effect size converts all of these richness measurements to a common unitless scale for each study, so that eventually pooling of the data together in a meta-analytic framework is more appropriate. The Hedges’ g calculations in metafor also take into account sample size for treatment and control groups as well as measures of variation for both groups.

Below, we include the code necessary to calculate Hedges’ g for the invasive tree dataset included in this book.

require(metafor) # Make sure the metafor package is loaded
## Loading required package: metafor
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loading 'metafor' package (version 2.4-0). For an overview 
## and introduction to the package please type: help(metafor).
SMD_effect_sizes <-
  escalc( # This is the function in metafor that allows us to calculate an effect size for each row in a database
    "SMD",
    # Specify the effect size we want to calculate. In this case SMD for Standardized mean difference
    m1i = mean_invaded,
    # mean richness at invaded sites
    n1i = sample_size_invaded,
    # invaded site sample size
    sd1i = SD_invaded,
    # invaded site SD
    m2i = mean_control,
    # mean richness at control sites
    n2i = sample_size_control,
    # control site sample size
    sd2i = SD_control,
    # control site SD
    data = meta_analysis_data # This is where the escalc function can find all the data for our meta-analysis
  )

Now, take a look at the last two columns in the new dataframe we’ve created. One of these yi represents the new effect size calculated for each row in the database. The other column, vi is the variance for each effect size.

datatable(SMD_effect_sizes, fillContainer = TRUE)

See the yi value for the first row. This is a study (Ellis et al. 2000) that investigated how arthropod richness changes in different types of vegetation along rivers in New Mexico. They included sites with stands of introduced saltcedar (treatment) and sites without saltcedar (control) and counted up the number of species of arthropods at both sites. The yi in this first row is the calculated value for Hedges’ g for this study. The value of -0.62 indicates that invasive sites had lower species richness than control sites. We know this because the data assigned to “position #1” in our effect size calculations (see R code above) is all of data that has to do with invaded sites. If we switch up the code and assign control sites and associated data to “position #1” in our effect size code above, we would expect a +0.62 effect size. The value for vi shows the variance calculated for this effect size estimate.

The next chapters in this book all rely on the values for yi and vi that we just calculated to generate an overall, single, effect size to summarize findings from the many studies included in this database.

Flowering tamarisk. Photo credit: Teun Spaans, CC BY-SA 3.0

Figure 2.1: Flowering tamarisk. Photo credit: Teun Spaans, CC BY-SA 3.0

2.2 Response Ratio (aka Ratio of Means)

The next effect size we’ll discuss is called the Response Ratio (RR). It’s another frequently used effect size in both ecological and medical meta-analysis [Koricheva, Gurevitch, and Mengersen (2013); ]. This effect size is most often used when the effects being compared both have positive signs or both have negative signs. For example, in our invasion impact dataset, but values from the invaded and control sites will have positive signs since richness values (the average number of species at a site or across several sites) can’t be negative.

Using our example data, an effect size is calculated for each paired richness measurements, and it is a ratio of the average richness at invaded sites in one article \(\bar{x_i}\) over the average richness of sites without the focal invasive species \(\bar{x_c}\). The mathematical representation of the response ratio is relatively straightforward:

\(RR = \frac{\bar{x_i}}{\bar{x_c}}\)

Because in statistics, fractions can present challenges for creating models based on a normal distribution, meta-analysts typically convert the RR fraction to the natural log of the RR prior to making any statistical calculations.

\(\ln(RR) = ln(\frac{\bar{x_i}}{\bar{x_c}})\)

As you will see in later calculations, a positive value for \(\ln(RR)\) means that the numerator of the response ratio is larger than the denominator, and a negative value would indicate that the denominator is larger than the numerator. In the context of our example data, a positive \(\ln(RR)\) suggest that invasive species increase richness where they are found, and negative \(\ln(RR)\) suggests that invasive species decrease richness.

One benefit of using the Response Ratio is that \(\ln(RR)\) can quickly be backtransformed to the \(RR\) to provide % increases or decreases in richness with invasive species.

You can find many examples of ecological meta-analyses that use the response ratio as an effect size from a wide variety of disciplines. For example there are meta-analyses using the Response Ratio that investigate how cover crops change biomass on farms (Thapa et al. 2018), how bee density influences crop production (Rollin and Garibaldi 2019), bat activity in wet vs. dry Australian landscapes (Blakey et al. 2018), and how changes in natural habitat affects reptile abundance (Doherty et al. 2020).

RR_effect_sizes <- escalc( # Function for calculating effect sizes.
    "ROM",
    # Specify the effect size we want to calculate. In this case ROM for the Ratio of Means or Response Ratio
    m1i = mean_invaded,
    # mean richness at invaded sites
    n1i = sample_size_invaded,
    # invaded site sample size
    sd1i = SD_invaded,
    # invaded site SD
    m2i = mean_control,
    # mean richness at control sites
    n2i = sample_size_control,
    # control site sample size
    sd2i = SD_control,
    # control site SD
    data = meta_analysis_data # This is where the escalc function can find all the data for our meta-analysis
  )

One limitation to using the \(RR\) is that the response ratio cannot be calculated if the either mean effect of the control or treatment group is zero. However, some researchers suggest substituting the 0 value for a very small non-zero value (e.g., 0.1) so that the response ratio can still be used for the meta-analysis (Thapa, Mirsky, and Tully 2018). However, another recent meta-analysis opted to use Hedges’ g as an effect size precisely because some of their data had a mean effect size of 0 (Doherty et al. 2020).

Going forward, we illustrate meta-analytic models using the Response Ratio as an effect size for two main reason. First, we did not have richness values of zero in our dataset and so our data could seamlessly work with the response ratio. Second, we wanted to back-transform our \(\ln(RR)\) values to provide land managers with an average percent increase or decrease in richness with the presence of invasive species.

References

Blakey, R V, B S Law, T M Straka, R T Kingsford, and D J Milne. 2018. “Importance of Wetlands to Bats on a Dry Continent: A Review and Meta-Analysis.” Hystrix, 13.

Crystal-Ornelas, Robert, Jeffrey A. Brown, Rafael E. Valentin, Caroline Beardsley, and Julie L Lockwood. n.d. “Meta-Analysis Shows That Overabundant Deer (Cervidae) Populations Consistently Decrease Average Species Abundance and Richness of Forest Birds.”

Doherty, Tim S., Sara Balouch, Kristian Bell, Thomas J. Burns, Anat Feldman, Charles Fist, Timothy F. Garvey, Tim S. Jessop, Shai Meiri, and Don A. Driscoll. 2020. “Reptile Responses to Anthropogenic Habitat Modification: A Global Meta-Analysis: Reptile Responses to Anthropogenic Habitat Modification: A Global Meta-Analysis.” Edited by Brian McGill. Global Ecology and Biogeography, March. https://doi.org/10/ggrbd9.

Ellis, L. M., M. C. Molles, C. S. Crawford, and F. Heinzelmann. 2000. “Surface-Active Arthropod Communities in Native and Exotic Riparian Vegetation in the Middle Rio Grande Valley, New Mexico.” Southwestern Naturalist 45 (4): 456–71. https://doi.org/10.2307/3672594.

Hedges, Larry V. 1981. “Distribution Theory for Glass’s Estimator of Effect Size and Related Estimators.” Journal of Educational Statistics 6 (2): 107–28.

Hedges, Larry V., and I Olkin. 1985. Statistical Method for Meta-Analysis. Orlando, FL: Academic Press.

Koricheva, Julia, Jessica Gurevitch, and Kerrie Mengersen. 2013. Handbook of Meta-Analysis in Ecology and Evolution. Princeton University Press.

Rollin, Orianne, and Lucas A. Garibaldi. 2019. “Impacts of Honeybee Density on Crop Yield: A Meta-Analysis.” Journal of Applied Ecology, February. https://doi.org/10.1111/1365-2664.13355.

Thapa, Resham, Steven B. Mirsky, and Katherine L. Tully. 2018. “Cover Crops Reduce Nitrate Leaching in Agroecosystems: A Global Meta-Analysis.” Journal of Environment Quality 47 (6): 1400. https://doi.org/10.2134/jeq2018.03.0107.

Thapa, Resham, Hanna Poffenbarger, Katherine L. Tully, Victoria J. Ackroyd, Matt Kramer, and Steven B. Mirsky. 2018. “Biomass Production and Nitrogen Accumulation by Hairy Vetch–Cereal Rye Mixtures: A Meta-Analysis.” Agronomy Journal 110 (4): 1197. https://doi.org/10.2134/agronj2017.09.0544.

Závorka, Libor, Mathieu Buoro, and Julien Cucherousset. 2018. “The Negative Ecological Impacts of a Globally Introduced Species Decrease with Time Since Introduction.” Global Change Biology 24 (9): 4428–37. https://doi.org/10.1111/gcb.14323.