# Chapter 5 Post-hoc tests

When a one-way ANOVA test leads to a significant result, it is common to then follow up with post-hoc tests to see which particular groups are significantly different from each other. Post-hoc tests essentially involve carrying out multiple \(t\)-tests to test for differences between each pair of categories. However, it is not quite as simple as simply carrying out pairwise \(t\)-tests. Every time we carry out a hypothesis test, we have a chance of making a Type I error (the probability of this occurring is \(\alpha\), normally 0.05). So, when multiple hypothesis tests are carried out, our chance of making a Type I error naturally increases because we are exposed to the probability of a Type I error occurring not just once, but multiple times. For that reason, we need to apply an adjustment to the resulting \(p\)-values to account for this. There are different adjustment methods available, some of which are more common in different disciplines. For our purposes, we will be using the Tukey adjustment.

For this example, the results of the post-hoc tests with the Tukey adjustment are as follows:

```
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = flipper_length_mm ~ species, data = penguins)
$species
diff lwr upr p adj
Chinstrap-Adelie 5.869887 3.586583 8.153191 0
Gentoo-Adelie 27.233349 25.334376 29.132323 0
Gentoo-Chinstrap 21.363462 19.000841 23.726084 0
```

For each pair of groups, the output provides the difference in group means, a 95% confidence interval of the difference, as well as a \(p\)-value indicating whether or not the difference is statistically significant.

Considering the the comparison between the Chinstrap and Adelie species, we can see that the mean difference in flipper length was 5.87mm (read from the `diff`

column), with a 95% confidence interval of (3.59, 8.15) (read from the `lwr`

and `upr`

columns). Since this confidence interval does not include zero, we can conclude that the difference between these two groups is significantly different. By looking at the `p adj`

column, we can see that the \(p\)-value is close to 0 (although the output says 0, a \(p\)-value is never truly equal to exactly 0), leading to the same conclusion.

**Your turn**

See if you can interpret the results of the comparisons for Gentoo-Adelie and Gentoo-Chinstrap:

- There evidence of a statistically significant difference in mean flipper length between the Gentoo and Adelie species.
- There evidence of a statistically significant difference in mean flipper length between the Gentoo and Chinstrap species.

Since both \(p\)-values are approximately 0 (see `p adj`

column), there is a statistically significant difference between all pairs of species. We can also see that neither confidence interval includes 0.