Chapter 5 Post-hoc tests

When a one-way ANOVA test leads to a significant result, it is common to then follow up with post-hoc tests to see which particular groups are significantly different from each other. Post-hoc tests essentially involve carrying out multiple \(t\)-tests to test for differences between each pair of categories. However, it is not quite as simple as simply carrying out pairwise \(t\)-tests. Every time we carry out a hypothesis test, we have a chance of making a Type I error (the probability of this occurring is \(\alpha\), normally 0.05). So, when multiple hypothesis tests are carried out, our chance of making a Type I error naturally increases because we are exposed to the probability of a Type I error occurring not just once, but multiple times. For that reason, we need to apply an adjustment to the resulting \(p\)-values to account for this. There are different adjustment methods available, some of which are more common in different disciplines. For our purposes, we will be using the Tukey adjustment.

For this example, the results of the post-hoc tests with the Tukey adjustment are as follows:

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = flipper_length_mm ~ species, data = penguins)

$species
                      diff       lwr       upr p adj
Chinstrap-Adelie  5.869887  3.586583  8.153191     0
Gentoo-Adelie    27.233349 25.334376 29.132323     0
Gentoo-Chinstrap 21.363462 19.000841 23.726084     0

For each pair of groups, the output provides the difference in group means, a 95% confidence interval of the difference, as well as a \(p\)-value indicating whether or not the difference is statistically significant.

Considering the the comparison between the Chinstrap and Adelie species, we can see that the mean difference in flipper length was 5.87mm (read from the diff column), with a 95% confidence interval of (3.59, 8.15) (read from the lwr and upr columns). Since this confidence interval does not include zero, we can conclude that the difference between these two groups is significantly different. By looking at the p adj column, we can see that the \(p\)-value is close to 0 (although the output says 0, a \(p\)-value is never truly equal to exactly 0), leading to the same conclusion.

Your turn

See if you can interpret the results of the comparisons for Gentoo-Adelie and Gentoo-Chinstrap:

  1. There evidence of a statistically significant difference in mean flipper length between the Gentoo and Adelie species.
  2. There evidence of a statistically significant difference in mean flipper length between the Gentoo and Chinstrap species.

Since both \(p\)-values are approximately 0 (see p adj column), there is a statistically significant difference between both pairs of species. We can also see that neither confidence interval includes 0.