Chapter 2 Visualising the data

As with any hypothesis test, it is a good idea to begin by visualising the data and looking at some basic descriptive statistics. This can give us an idea about what we may expect when we carry out the hypothesis test, and also when checking the assumptions later.

Of course, we have already seen the below boxplot showing the flipper lengths of the penguins separated by species:

We can also examine the sample size (excluding missing values), sample mean, and sample standard deviation (SD) for each group:

Table 2.1: Sample size, mean and standard deviation of flipper lengths by species
Sample size Mean SD
Adelie 151 189.95 6.54
Chinstrap 68 195.82 7.13
Gentoo 123 217.19 6.48

From the above, we can observe the following:

  1. The boxplots and sample means suggest that the population mean flipper lengths may be different between groups. When we carry out the one-way ANOVA, we will see whether or not this difference is statistically significant
  2. From the boxplots, the data appear to be similarly spread out within each group. The SD's are also similar to each other. This indicates the equal population variances assumption has not been violated. We will check this assumption more formally later using the Levene's test.
  3. The sample size in each group is 151, 68, and 123 respectively, for a total sample size of 342 (note that although there are 344 penguins in the data set, there were two missing values for the flipper length variable, meaning that only 342 penguins were included in the analysis).