STM1001 Topic 7: One-way ANOVA
2023-11-03
Introduction
In the previous topic, we learnt how to test for differences in means between two independent groups via the independent samples \(t\)-test. This was a situation where the independent variable was a categorical variable with only two categories.
What happens when we want to test for differences in means between two or more independent groups? In this case, we can use one-way Analysis of Variance, commonly referred to as one-way ANOVA.
What type of variables are required for a one-way ANOVA?
A one-way ANOVA will always involve two variables:
- The dependent variable, sometimes also called the response variable. This should be a numeric, continuous variable.
- The independent variable. This should be a categorical variable with two or more categories.
For example, let's consider a data set called penguins
from the R package called palmerpenguins
(Horst, Hill, and Gorman 2020; KB, TD, and WR 2014). This is a data set that includes various measurements for 344 penguins, as well as other characteristics, such as species and island in Palmer Archipelago. For this example, we are interested in the following variables:
- Dependent variable: Flipper length (mm)
- Independent variable: Species (Adelie, Chinstrap, or Gentoo).
The below boxplot shows the flipper lengths of the penguins separated by species:
As we can see, the differences in the sample flipper lengths for the 3 groups suggests that there is a difference between the population mean flipper lengths for the 3 groups. But is this difference statistically significant? This is the type of question we can examine using a one-way ANOVA. The hypotheses for a one-way ANOVA can be set up as follows:
\[H_0:\mu_1 = \mu_2 = \ldots = \mu_k \;\;\text{versus}\;\;H_1: \text{not all }\mu_i\text{'s are equal},\]
where:
- For some number of \(k\) groups, \(\mu_1 = \mu_2 = \ldots = \mu_k\) denote the population means for Group 1, Group 2, ..., and Group \(k\) respectively.
By carrying out a one-way ANOVA, we will be able to determine whether or not there is evidence to suggest that at least one of the \(\mu_i\)'s is significantly different from the others. This test does not tell us exactly which groups are significantly different from each other, nor how many; only whether at least one group is different from the others. However, post-hoc tests can be carried out for further analysis.