19 Effect size calculation
This section was written by Olha Guley.
19.1 Introduction
As we saw throughout the course, the effect size is one of the most meaningful ways to understand the impact of your findings in research. It goes beyond simply reporting whether a result is statistically significant and quantifies how much of an effect there is. In behavioral sciences, knowing how much the predictor affects the outcome variable can be more insightful than knowing whether the effect exists.
In this section, we will focus on using the effectsize
package in R to compute effect sizes for the most common types of analyses. By the end, you’ll know how to calculate and interpret effect sizes for:
- t-tests: Comparing two groups.
- ANOVAs: Comparing more than two groups.
- Linear Models: Understanding relationships between predictors and outcomes.
- Linear Mixed Models: Analyzing data with hierarchical or repeated measures.
19.2 How to calculate and report effect sizes
We use effect sizes because they help answer questions like: - How strong is the relationship? - Is the difference meaningful?
19.2.1 Loading the ‘effectsize’ package
First, ensure you have the necessary packages installed. For calculating the effect size, you need to install the “effectsize” package.
library(effectsize)
library(lme4) #For linear mixed models, which we will fit later in this section
library(tidyverse)
Now, let’s load and explore a dataset. To calculate the effect sizes, we will use the dataset from a study exploring to what extent the general public perceives seven common mental disorders as the brain disorders, and whether perceiving a certain disorder as a brain disorder is related to the perceptions of: heritability, age of onset, duration, severity, recovery potential, and perceptions of these illnesses as genetic or inherited disorders. The pre-registration is available via the link: https://osf.io/43wrg/?view_only=b7aa9af1cf884bdba81191e27bf59429)
19.2.2 Loading the data
Download the data from the OSF link to your working directory, and load into R.
## participant_id condition brain genetic
## 1 5e3421311ddb123e1de9e6ad ocd 90 72
## 2 5b5f23420085cb0001872849 depr 85 59
## 3 63f0ea7188b4cb3a4cc3658a adhd 85 80
## 4 542c323bfdf99b324ea3808d depr 72 35
## 5 665c050b07508f6c4f8d01e0 ocd 90 50
## 6 5d8cef3c28e183001a335ab0 anor 92 32
## knowledgeability onset duration severity recovery drug
## 1 70 37 71 73 75 43
## 2 68 36 76 77 30 68
## 3 30 10 100 15 0 90
## 4 44 26 61 64 41 61
## 5 32 30 100 86 19 78
## 6 88 36 73 99 13 26
## therapy Consent Attention Age Gender Student
## 1 63 Yes I PAY ATTENTION 37 Female No
## 2 78 Yes I pay attention 29 Female No
## 3 100 Yes I pay attention 30 Male No
## 4 64 Yes I pay attention 52 Female No
## 5 92 Yes I pay attention 32 Female No
## 6 51 Yes I pay attention 37 Female No
descriptive_stats_brain <- d %>%
group_by(condition) %>%
summarize(
mean_brain = mean(brain),
sd_brain = sd(brain))
print(descriptive_stats_brain)
## # A tibble: 7 × 3
## condition mean_brain sd_brain
## <chr> <dbl> <dbl>
## 1 adhd 74.2 14.8
## 2 anor 71.0 21.4
## 3 asd 74.5 19.6
## 4 bipo 81.7 13.6
## 5 depr 68.8 28.3
## 6 ocd 74.8 25.0
## 7 schi 84.9 12.7
19.3 Effect size for t-tests (Cohen’s d)
19.3.1 Example: Comparing perceived severity between two different disorder categories
T-tests are commonly used to compare two groups. For example, we might want to compare the perceived severity of major depressive disorder versus schizophrenia. Here’s how to calculate the effect size (Cohen’s d) for this comparison:
Let’s first filter data for two disorders of interest and perform a t-test:
x <- d %>%
filter(condition %in% c("depr", "schi"))
x_t_test <- t.test(severity ~ condition, data = x)
print(x_t_test)
##
## Welch Two Sample t-test
##
## data: severity by condition
## t = -0.8544, df = 57.402, p-value = 0.3964
## alternative hypothesis: true difference in means between group depr and group schi is not equal to 0
## 95 percent confidence interval:
## -12.14752 4.88085
## sample estimates:
## mean in group depr mean in group schi
## 79.23333 82.86667
Now let’s calculate the effect size (Cohen’s d):
## Cohen's d | 95% CI
## -------------------------
## -0.22 | [-0.73, 0.29]
##
## - Estimated using pooled SD.
19.3.2 What is Cohen’s d?
Cohen’s d quantifies the magnitude of differences between two groups in terms of standard deviations. This way of representing the effect size is particularly useful in comparing group means on the same scale, regardless of sample size. The formula is:
\[ Cohen's\,d = \frac{M_1 - M_2}{SD_{pooled}} \]
Where: - \(M_1\) and \(M_2\) are the means of the two groups. - \(SD_{pooled}\) is the pooled standard deviation, accounting for variability in both groups.
19.3.3 Interpretation guidelines
- Small effect: \(|d| = 0.2\).
- Medium effect: \(|d| = 0.5\).
- Large effect: \(|d| = 0.8\).
So, Cohen’s d quantifies the difference in means between two groups in terms of standard deviations, making it easy to understand the magnitude of the effect.
By the way, you may also encounter Hedges’ g. Hedges’ g is Cohen’s d with a correction for better estimation in small samples sizes. In large samples the two statistics will be the same; and even in small samples they will be pretty similar, with Hedges’ g just slightly smaller than Cohen’s d. The interpretation guidelines are the same.
19.4 Effect size for ANOVA (Eta-squared)
19.4.1 Example: Comparing recovery ratings across multiple disorder categories
When comparing more than two groups, an ANOVA is the go-to method. For example, we can compare the perceived recovery ratings of all seven disorders in the dataset, and then calculate the effect size for the ANOVA:
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 6 8184 1364.1 2.599 0.0191 *
## Residuals 197 103406 524.9
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## For one-way between subjects designs, partial eta
## squared is equivalent to eta squared. Returning eta
## squared.
## # Effect Size for ANOVA
##
## Parameter | Eta2 | 95% CI
## -------------------------------
## condition | 0.07 | [0.01, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
Here, you will likely see a warning message indicating that partial eta squared is equivalent to eta squared. This is normal and nothing to worry about. In one-way designs, the two effect size measures are the same, so the function returns eta squared for simplicity.
In this example, eta-squared (\(\eta^2\)) measures the proportion of variance explained by the grouping variable (condition). A higher value indicates that the disorder category explains a big part of the variance in perceived severity.
The formula for eta-squared is:
\[ \eta^2 = \frac{SS_{effect}}{SS_{total}} \]
Where: - \(SS_{effect}\): The sum of squares for the effect (a predictor or group). - \(SS_{total}\): The total sum of squares in the model.
19.4.2 Interpretation guidelines
- Small effect: \(\eta^2 = 0.01\).
- Medium effect: \(\eta^2 = 0.06\).
- Large effect: \(\eta^2 = 0.14\).
For example, an eta-squared value of \(\eta^2 = 0.07\) would indicate that 7% of the variance in perceived recovery for the disorders is explained by the disorder category. This would be considered a medium effect size.
19.5 3. Effect size for linear models (standardized coefficients)
19.5.1 Example: Predicting severity based on brain and genetic ratings
As we learned earlier in this course, the linear models allow us to investigate the relationship between predictors and an outcome. Let’s fit an lm to analyze how perceiving disorders as ‘brain’ and ‘genetic’ predicts perceived severity of the disorders:
##
## Call:
## lm(formula = severity ~ brain + genetic, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.69 -13.92 1.96 15.58 37.35
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.59736 5.91295 9.064 < 2e-16 ***
## brain 0.27103 0.07097 3.819 0.000179 ***
## genetic -0.06400 0.05665 -1.130 0.259930
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.59 on 201 degrees of freedom
## Multiple R-squared: 0.06948, Adjusted R-squared: 0.06022
## F-statistic: 7.504 on 2 and 201 DF, p-value: 0.0007194
We can calculate an effect size for the linear model directly.
## # Standardization method: refit
##
## Parameter | Std. Coef. | 95% CI
## ----------------------------------------
## (Intercept) | 8.23e-17 | [-0.13, 0.13]
## brain | 0.26 | [ 0.13, 0.40]
## genetic | -0.08 | [-0.21, 0.06]
This is done to provide effect sizes for individual predictors in the model. Such standardized coefficients help us understand the relative importance of predictors on a comparable scale.
19.5.2 What are Standardized Coefficients?
For linear models, effect sizes are often expressed as standardized coefficients (beta values), as we just saw in the current example. These coefficients indicate the change in the outcome variable (in standard deviation units) for a one standard deviation change in the predictor variable.
The formula is:
\[ \beta_{standardized} = \frac{b \cdot SD_{X}}{SD_{Y}} \]
Where: - \(b\): Unstandardized regression coefficient. - \(SD_{X}\): Standard deviation of the predictor. - \(SD_{Y}\): Standard deviation of the outcome.
19.5.3 Interpretation guidelines
- Small effect: \(\beta = 0.1\).
- Medium effect: \(\beta = 0.3\).
- Large effect: \(\beta = 0.5\).
So, coming back to our example, our results suggest that a one standard deviation increase in perception of a disorder as brain-based is associated with a 0.26 standard deviation increase in severity ratings, while the genetic ratings have a smaller insignificant and negative effect.
19.6 Effect sizes for linear mixed models (standardized coefficients)
19.6.1 Example: Do brain ratings predict the perceived duration of the disorder in within-participants design?
We use linear mixed models for hierarchical data or repeated measures. For instance, we might model duration as a function of brain ratings while accounting for variability between participants. The present study did not have repeated measures from the same participants, but we can pretend that it did by repeating the same participant IDs multiple times on different rows of the dataframe:
#Let's imitate within-participants design by changing "participant_id" variable
d$participant_id <- sample(1:50, nrow(d), replace = TRUE)
m2 <- lmer(duration ~ brain + (1|participant_id), data = d)
## boundary (singular) fit: see help('isSingular')
## Linear mixed model fit by REML. t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: duration ~ brain + (1 | participant_id)
## Data: d
##
## REML criterion at convergence: 1733.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4757 -0.5413 0.3436 0.7211 1.4701
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant_id (Intercept) 0.0 0.00
## Residual 287.1 16.94
## Number of obs: 204, groups: participant_id, 49
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 75.08866 4.54501 202.00000 16.521
## brain 0.14932 0.05793 202.00000 2.577
## Pr(>|t|)
## (Intercept) <2e-16 ***
## brain 0.0107 *
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## brain -0.965
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## # Standardization method: refit
##
## Parameter | Std. Coef. | 95% CI
## ----------------------------------------
## (Intercept) | 4.28e-16 | [-0.14, 0.14]
## brain | 0.18 | [ 0.04, 0.31]
To recap the materials from this semester, the mixed models allow for random effects, which enable us to account for repeated measures or clustered data. The standardized coefficients here show the effect sizes for the fixed effects only, the formula and the interpretation are the same as for linear models.
Usually, for the mixed models, only the coefficients for the fixed effects are reported. Calculating the effect size for the random effects is more complex, so we will not cover it in this section. If you are interested in this topic, you can check the R package “Durga” and this paper this paper
19.7 Conclusion
In this lesson, we explored how to compute and interpret effect sizes for:
- t-tests: Quantifying differences between two groups (Cohen’s d).
- ANOVAs: Measuring variance explained by groups (eta-squared).
- Linear Models and Linear Mixed Models: Understanding predictors’ relative importance (standardized coefficients).
For more information, explore the effectsize
package documentation: CRAN effectsize vignette.
19.8 References
Makowski, D., Ben-Shachar, M. S., Chen, S. H. A., & Lüdecke, D. (2020). Indices of Effect Existence and Significance in the R Package effectsize
. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863. https://doi.org/10.3389/fpsyg.2013.00863