Chapter 4 Checking assumptions

The two assumptions we need to check for here are:

Equality of variances
Normality of the of the random errors

Using the Levene's test to test for equal variances

We will begin by using the Levene's test to test for equality of variances. Recall the following null and alternative hypotheses:

$H_0 : \text{The groups have equal variances}$
$H_1 : \text{The groups do not have equal variances}.$

The test tells us to only reject this assumption if we get a small $p$ -value. That is, a small $p$ -value indicates the groups do not have equal variances. To summarise:

Levene's test for equality of variances:

If $p$ < 0.05, equal variances cannot be assumed
If $p$ > 0.05, equal variances can be assumed

Let's carry out the Levene's test for the penguin example:

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  0.3306 0.7188
      339

As we can see, we have $p = 0.7188$ . Since $p > 0.05$ , equal variances can be assumed. Given our observations from the box plots and standard deviations earlier, this is not a surprising result.

Checking for normality

When checking for normality for a one-way ANOVA, this needs to be done for for the random errors (rather than the response variable). The "random errors" can be approximated by the "residuals". For a one-way ANOVA, each observation will have a corresponding residual, which is the difference between the observed value of the dependent variable for that observation, and the mean of the dependent variable for that group. For example, the first penguin in the data set is from the Adelie group, which has an average flipper length of 189.95mm. This particular penguin's flipper length is 181mm. So its residual is $181 - 189.95 = -8.95$ . Similarly, the residual value can be calculated for all 342 penguins. We will check for normality using these 342 residual values.

Let's take a look at the histogram, Normal Q-Q plot, and Shapiro-Wilk test results:

Shapiro-Wilk test:


    Shapiro-Wilk normality test

data:  residuals
W = 0.99452, p-value = 0.2609

Considering the histogram, Normal Q-Q plots, and Shapiro-Wilk test result, we can safely conclude the normality assumption has been met.

Note that we have used the unstandardised residuals to check for normality here. Some people prefer to use the standardised residuals so that values can be easily interpreted.

What if the assumptions are not met?

If the assumptions have not been met, there are several options available, such as:

Use of the Welch ANOVA, which does not require equal variances
Use of a non-parametric alternative such as the Kruskall-Wallis test
Transformation of the dependent variable.

However for a one-way ANOVA analysis, these techniques are beyond the scope of this subject.