19 Effect size calculation

This section was written by Olha Guley.

19.1 Introduction

As we saw throughout the course, the effect size is one of the most meaningful ways to understand the impact of your findings in research. It goes beyond simply reporting whether a result is statistically significant and quantifies how much of an effect there is. In behavioral sciences, knowing how much the predictor affects the outcome variable can be more insightful than knowing whether the effect exists.

In this section, we will focus on using the effectsize package in R to compute effect sizes for the most common types of analyses. By the end, you’ll know how to calculate and interpret effect sizes for:

  1. t-tests: Comparing two groups.
  2. ANOVAs: Comparing more than two groups.
  3. Linear Models: Understanding relationships between predictors and outcomes.
  4. Linear Mixed Models: Analyzing data with hierarchical or repeated measures.

19.2 How to calculate and report effect sizes

We use effect sizes because they help answer questions like: - How strong is the relationship? - Is the difference meaningful?

19.2.1 Loading the ‘effectsize’ package

First, ensure you have the necessary packages installed. For calculating the effect size, you need to install the “effectsize” package.

library(effectsize)
library(lme4)  #For linear mixed models, which we will fit later in this section
library(tidyverse)

Now, let’s load and explore a dataset. To calculate the effect sizes, we will use the dataset from a study exploring to what extent the general public perceives seven common mental disorders as the brain disorders, and whether perceiving a certain disorder as a brain disorder is related to the perceptions of: heritability, age of onset, duration, severity, recovery potential, and perceptions of these illnesses as genetic or inherited disorders. The pre-registration is available via the link: https://osf.io/43wrg/?view_only=b7aa9af1cf884bdba81191e27bf59429)

19.2.2 Loading the data

Download the data from the OSF link to your working directory, and load into R.

d <- read.csv("clean_data_brainstudy.csv")

head(d)
##             participant_id condition brain genetic
## 1 5e3421311ddb123e1de9e6ad       ocd    90      72
## 2 5b5f23420085cb0001872849      depr    85      59
## 3 63f0ea7188b4cb3a4cc3658a      adhd    85      80
## 4 542c323bfdf99b324ea3808d      depr    72      35
## 5 665c050b07508f6c4f8d01e0       ocd    90      50
## 6 5d8cef3c28e183001a335ab0      anor    92      32
##   knowledgeability onset duration severity recovery drug
## 1               70    37       71       73       75   43
## 2               68    36       76       77       30   68
## 3               30    10      100       15        0   90
## 4               44    26       61       64       41   61
## 5               32    30      100       86       19   78
## 6               88    36       73       99       13   26
##   therapy Consent        Attention Age Gender Student
## 1      63     Yes  I PAY ATTENTION  37 Female      No
## 2      78     Yes  I pay attention  29 Female      No
## 3     100     Yes  I pay attention  30   Male      No
## 4      64     Yes  I pay attention  52 Female      No
## 5      92     Yes I pay attention   32 Female      No
## 6      51     Yes  I pay attention  37 Female      No
descriptive_stats_brain <- d %>%
  group_by(condition) %>%
  summarize(
    mean_brain = mean(brain),
    sd_brain = sd(brain))

print(descriptive_stats_brain)
## # A tibble: 7 × 3
##   condition mean_brain sd_brain
##   <chr>          <dbl>    <dbl>
## 1 adhd            74.2     14.8
## 2 anor            71.0     21.4
## 3 asd             74.5     19.6
## 4 bipo            81.7     13.6
## 5 depr            68.8     28.3
## 6 ocd             74.8     25.0
## 7 schi            84.9     12.7

19.3 Effect size for t-tests (Cohen’s d)

19.3.1 Example: Comparing perceived severity between two different disorder categories

T-tests are commonly used to compare two groups. For example, we might want to compare the perceived severity of major depressive disorder versus schizophrenia. Here’s how to calculate the effect size (Cohen’s d) for this comparison:

Let’s first filter data for two disorders of interest and perform a t-test:

x <- d %>% 
  filter(condition %in% c("depr", "schi"))

x_t_test <- t.test(severity ~ condition, data = x)
print(x_t_test)
## 
##  Welch Two Sample t-test
## 
## data:  severity by condition
## t = -0.8544, df = 57.402, p-value = 0.3964
## alternative hypothesis: true difference in means between group depr and group schi is not equal to 0
## 95 percent confidence interval:
##  -12.14752   4.88085
## sample estimates:
## mean in group depr mean in group schi 
##           79.23333           82.86667

Now let’s calculate the effect size (Cohen’s d):

eff_size_x <- cohens_d(severity ~ condition, data = x)
print(eff_size_x)
## Cohen's d |        95% CI
## -------------------------
## -0.22     | [-0.73, 0.29]
## 
## - Estimated using pooled SD.

19.3.2 What is Cohen’s d?

Cohen’s d quantifies the magnitude of differences between two groups in terms of standard deviations. This way of representing the effect size is particularly useful in comparing group means on the same scale, regardless of sample size. The formula is:

\[ Cohen's\,d = \frac{M_1 - M_2}{SD_{pooled}} \]

Where: - \(M_1\) and \(M_2\) are the means of the two groups. - \(SD_{pooled}\) is the pooled standard deviation, accounting for variability in both groups.

19.3.3 Interpretation guidelines

  • Small effect: \(|d| = 0.2\).
  • Medium effect: \(|d| = 0.5\).
  • Large effect: \(|d| = 0.8\).

So, Cohen’s d quantifies the difference in means between two groups in terms of standard deviations, making it easy to understand the magnitude of the effect.

By the way, you may also encounter Hedges’ g. Hedges’ g is Cohen’s d with a correction for better estimation in small samples sizes. In large samples the two statistics will be the same; and even in small samples they will be pretty similar, with Hedges’ g just slightly smaller than Cohen’s d. The interpretation guidelines are the same.

19.4 Effect size for ANOVA (Eta-squared)

19.4.1 Example: Comparing recovery ratings across multiple disorder categories

When comparing more than two groups, an ANOVA is the go-to method. For example, we can compare the perceived recovery ratings of all seven disorders in the dataset, and then calculate the effect size for the ANOVA:

recovery_anova <- aov(recovery ~ condition, data = d)
summary(recovery_anova)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## condition     6   8184  1364.1   2.599 0.0191 *
## Residuals   197 103406   524.9                 
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
recovery_anova_eff <- eta_squared(recovery_anova)
## For one-way between subjects designs, partial eta
##   squared is equivalent to eta squared. Returning eta
##   squared.
print(recovery_anova_eff)
## # Effect Size for ANOVA
## 
## Parameter | Eta2 |       95% CI
## -------------------------------
## condition | 0.07 | [0.01, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

Here, you will likely see a warning message indicating that partial eta squared is equivalent to eta squared. This is normal and nothing to worry about. In one-way designs, the two effect size measures are the same, so the function returns eta squared for simplicity.

In this example, eta-squared (\(\eta^2\)) measures the proportion of variance explained by the grouping variable (condition). A higher value indicates that the disorder category explains a big part of the variance in perceived severity.

The formula for eta-squared is:

\[ \eta^2 = \frac{SS_{effect}}{SS_{total}} \]

Where: - \(SS_{effect}\): The sum of squares for the effect (a predictor or group). - \(SS_{total}\): The total sum of squares in the model.

19.4.2 Interpretation guidelines

  • Small effect: \(\eta^2 = 0.01\).
  • Medium effect: \(\eta^2 = 0.06\).
  • Large effect: \(\eta^2 = 0.14\).

For example, an eta-squared value of \(\eta^2 = 0.07\) would indicate that 7% of the variance in perceived recovery for the disorders is explained by the disorder category. This would be considered a medium effect size.

19.5 3. Effect size for linear models (standardized coefficients)

19.5.1 Example: Predicting severity based on brain and genetic ratings

As we learned earlier in this course, the linear models allow us to investigate the relationship between predictors and an outcome. Let’s fit an lm to analyze how perceiving disorders as ‘brain’ and ‘genetic’ predicts perceived severity of the disorders:

m1 <- lm(severity ~ brain + genetic, data = d)
summary(m1)
## 
## Call:
## lm(formula = severity ~ brain + genetic, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -58.69 -13.92   1.96  15.58  37.35 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 53.59736    5.91295   9.064  < 2e-16 ***
## brain        0.27103    0.07097   3.819 0.000179 ***
## genetic     -0.06400    0.05665  -1.130 0.259930    
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.59 on 201 degrees of freedom
## Multiple R-squared:  0.06948,    Adjusted R-squared:  0.06022 
## F-statistic: 7.504 on 2 and 201 DF,  p-value: 0.0007194

We can calculate an effect size for the linear model directly.

m1_effect_size <- standardize_parameters(m1)
print(m1_effect_size)
## # Standardization method: refit
## 
## Parameter   | Std. Coef. |        95% CI
## ----------------------------------------
## (Intercept) |   8.23e-17 | [-0.13, 0.13]
## brain       |       0.26 | [ 0.13, 0.40]
## genetic     |      -0.08 | [-0.21, 0.06]

This is done to provide effect sizes for individual predictors in the model. Such standardized coefficients help us understand the relative importance of predictors on a comparable scale.

19.5.2 What are Standardized Coefficients?

For linear models, effect sizes are often expressed as standardized coefficients (beta values), as we just saw in the current example. These coefficients indicate the change in the outcome variable (in standard deviation units) for a one standard deviation change in the predictor variable.

The formula is:

\[ \beta_{standardized} = \frac{b \cdot SD_{X}}{SD_{Y}} \]

Where: - \(b\): Unstandardized regression coefficient. - \(SD_{X}\): Standard deviation of the predictor. - \(SD_{Y}\): Standard deviation of the outcome.

19.5.3 Interpretation guidelines

  • Small effect: \(\beta = 0.1\).
  • Medium effect: \(\beta = 0.3\).
  • Large effect: \(\beta = 0.5\).

So, coming back to our example, our results suggest that a one standard deviation increase in perception of a disorder as brain-based is associated with a 0.26 standard deviation increase in severity ratings, while the genetic ratings have a smaller insignificant and negative effect.

19.6 Effect sizes for linear mixed models (standardized coefficients)

19.6.1 Example: Do brain ratings predict the perceived duration of the disorder in within-participants design?

We use linear mixed models for hierarchical data or repeated measures. For instance, we might model duration as a function of brain ratings while accounting for variability between participants. The present study did not have repeated measures from the same participants, but we can pretend that it did by repeating the same participant IDs multiple times on different rows of the dataframe:

#Let's imitate within-participants design by changing "participant_id" variable
d$participant_id <- sample(1:50, nrow(d), replace = TRUE)

m2 <- lmer(duration ~ brain + (1|participant_id), data = d)
## boundary (singular) fit: see help('isSingular')
summary(m2)
## Linear mixed model fit by REML. t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: duration ~ brain + (1 | participant_id)
##    Data: d
## 
## REML criterion at convergence: 1733.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4757 -0.5413  0.3436  0.7211  1.4701 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  participant_id (Intercept)   0.0     0.00   
##  Residual                   287.1    16.94   
## Number of obs: 204, groups:  participant_id, 49
## 
## Fixed effects:
##              Estimate Std. Error        df t value
## (Intercept)  75.08866    4.54501 202.00000  16.521
## brain         0.14932    0.05793 202.00000   2.577
##             Pr(>|t|)    
## (Intercept)   <2e-16 ***
## brain         0.0107 *  
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## brain -0.965
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
m2_effect_size <- standardize_parameters(m2)
## boundary (singular) fit: see help('isSingular')
print(m2_effect_size)
## # Standardization method: refit
## 
## Parameter   | Std. Coef. |        95% CI
## ----------------------------------------
## (Intercept) |   4.28e-16 | [-0.14, 0.14]
## brain       |       0.18 | [ 0.04, 0.31]

To recap the materials from this semester, the mixed models allow for random effects, which enable us to account for repeated measures or clustered data. The standardized coefficients here show the effect sizes for the fixed effects only, the formula and the interpretation are the same as for linear models.

Usually, for the mixed models, only the coefficients for the fixed effects are reported. Calculating the effect size for the random effects is more complex, so we will not cover it in this section. If you are interested in this topic, you can check the R package “Durga” and this paper this paper

19.7 Conclusion

In this lesson, we explored how to compute and interpret effect sizes for:

  • t-tests: Quantifying differences between two groups (Cohen’s d).
  • ANOVAs: Measuring variance explained by groups (eta-squared).
  • Linear Models and Linear Mixed Models: Understanding predictors’ relative importance (standardized coefficients).

For more information, explore the effectsize package documentation: CRAN effectsize vignette.

19.8 References

Makowski, D., Ben-Shachar, M. S., Chen, S. H. A., & Lüdecke, D. (2020). Indices of Effect Existence and Significance in the R Package effectsize. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863. https://doi.org/10.3389/fpsyg.2013.00863