Chapter 3 Carrying out the test

In this section, we will carry out a one-way ANOVA to test whether the average flipper length is different between species. We can propose this question in the form of the following hypotheses:

$H_0: \mu_1 = \mu_2 = \mu_3 \text{ versus } H_1: \text{not all } \mu_i\text{'s are equal,}$

where:

$\mu_1$ denotes the population mean flipper length of Adelie species penguins
$\mu_2$ denotes the population mean flipper length of Chinstrap species penguins
$\mu_3$ denotes the population mean flipper length of Gentoo species penguins.

Previously, when carrying out $t$ -tests, we have calculated a test statistic and then evaluated how extreme this was by using the $t$ -distribution. However, for ANOVA tests, we use the $F$ -distribution. The $F$ -distribution is defined by two degrees of freedom: $d_1$ and $d_2$ . For a one-way ANOVA, we have that:

$d_1 = k - 1$ , where $k$ is the number of groups. This is the "between group" degrees of freedom.
$d_2 = N - k$ , where $N$ is the total sample size. This is the "within group" degrees of freedom.

So for our particular example, we have that $d_1 = 3 - 1 = 2$ and $d_2 = 342 - 3 = 339$ , so that the distribution that will be used is the $F_{2, 339}$ distribution. The below figure shows some example density curves of the $F$ distribution for varying degrees of freedom:

For a one-way ANOVA, the test statistic, or the $F$ value, is calculated by estimating the ratio of between group variation to within group variation: $\displaystyle \frac{\text{between group variation}}{\text{within group variation}}$ . The between group variation is a measure of how much the sample means for each group vary. The within group variation is a measure of how much individual sample values within a group vary from their group sample mean. If the between group variation is much larger than the within group variation, then the $F$ -statistic will be very large and lead to a statistically significant result. On the other hand, if the between group variation is not large compared to the within group variation, then the $F$ -statistic will not be large and subsequently will not lead to a statistically significant result.

We are now ready to carry out the one-way ANOVA. The results of the test are as follows:

             Df Sum Sq Mean Sq F value Pr(>F)    
species       2  52473   26237   594.8 <2e-16 ***
Residuals   339  14953      44                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2 observations deleted due to missingness

Assuming (for now) that the assumptions have been met, we note the following:

The $p$ -value (read from the Pr(>F) column) is almost 0, which is much less than 0.05, so we reject $H_0$ . That is, we have enough evidence to conclude that there is a statistically significant difference between groups
The significant result tells us that at least one of the groups is significantly different from the others, but it does not tell us which group(s), or how many. We will carry out pot-hoc tests later for further analysis
The test statistic (F value) is $F = 594.8$
$d_1 = 2$ (read from the Df column, species row)
$d_2 = 339$ (read from the Df column, Residuals row)
To summarise, we can write: There was a significant difference in mean flipper length [F(2, 339) = 594.8, $p < .001$ ] between penguin species.