Topic 9 One-way ANOVA
Before, with the independent sample t-test, we wanted to test if the population mean of a group was equal to the population mean of another group.
Now, we are interested in testing the equality of population means for multiple groups. This is called the one-way ANOVA test.
We use this test to examine the means of one continuous variable (we call it the dependent variable) across different groups defined by a categorical variable (we call it the independent variable).
9.1 Details
If we are looking at only two groups (two categories), the one-way ANOVA and the independent sample t-test give us the same results.
Now, our null hypothesis is: the population means for all groups are equal. Therefore, if we reject the null of the one-way ANOVA test, this simply means that at least one of the population means is not equal. We do further analysis to identify which one.
The reasoning behind the one-way ANOVA is the same as the one behind the t-test. Now, however, our distribution is an F distribution, not a t distribution. Therefore, we look for the F critical on an F table, not a t table.
I will show you how to use the F-table in the following sections.
9.2 Formulas and calculations
Steps:
- Calculate the mean and sum of squares for each of the groups you are interested. This gives you the within-group variability.
- Use the means from each group to calculate between-group variability (see below)
- Use (1) and (2) in the F formula (below).
Formula for (3):
\[F = \frac{\text{between-group variability}}{\text{within-group variability}}\]
Formulas for (1): \[\text{within-group variability} = \frac{SS_1 + SS_2 + SS_3}{df_1 + df_2 + df_3}\]
Remember how to calculate \(SS_1 + SS_2 + SS_3\)? Just make the tables like we did last time (see below). It is important that you calculate the mean for each group.
\(x\) | f | fx | \(x - \overline{x}\) | \((x - \overline{x})^2\) | \(f(x - \overline{x})^2\) |
---|---|---|---|---|---|
… | … | … | … | … | … |
. | \(\sum f\) | \(\sum fx\) | \(\sum = SS_k\) |
Formulas for (2): \[\text{between-group variability} = \frac{SS_b}{df_b}\] \(df_b = k -1\) where \(k\) is the number of groups you are comparing.
But where does \(SS_b\) from?
Those values come from a new table (see below).
Important
In the table below, \(x\) is the mean you calculate for each group.\(\overline{x}\) is the mean for this table, which you calculate using the usual formula:
\[\overline{x} = \frac{\sum fx}{\sum f}\]
\(x\) (means) | n | f | fx | \(x - \overline{x}\) | \((x - \overline{x})^2\) | \(n(x - \overline{x})^2\) |
---|---|---|---|---|---|---|
mean (group 1) | n (group 1) | 1 (usually) | for g. 1 | for g. 1 | for g. 1 | for g. 1 |
mean (group 2) | n (group 2) | 1 (usually) | for g. 2 | for g. 2 | for g. 2 | for g. 2 |
mean (group 3) | n (group 3) | 1 (usually) | for g. 3 | for g. 3 | for g. 3 | for g. 3 |
mean (group k) | n (group k) | 1 (usually) | for g. k | for g. k | for g. k | for g. k |
\(\sum x\) | \(\sum f\) | \(\sum fx\) | \(\sum = SS_b\) |
9.3 Reading the F-table
The F-table tells us the critical values of F for a given \(\alpha\). The only tricky thing about this table is that you need to look for the correct degrees of freedom.
\[\frac{\text{numerator df}}{\text{denominator df}}= \frac{k-1}{N-k}\]
where \(N\) is the sum of all sample sizes \(n\).
\(k\) is the number of groups you are comparing.
Our interpretation is as follows:
- if the calculated \(F\) is higher than the critical value (\(F_{critical}\)), we reject the null hypothesis.
- if the calculated \(F\) is lower than the critical value (\(F_{critical}\)), we do not reject the null hypothesis.
9.4 Assumptions
For the one-way ANOVA test to work, three assumptions need to hold:
- Normality: The variable of interest should be normally distributed within each group.
- We usually do not perform mathematical tests to check for normality.
- Looking at measures of central tendency and histograms is probably the best way to do so.
- Independence: There must be no relationship between the variables in each group.
- This is not simple to test.
- Usually, we assume it holds if each group is composed of observations from different units. In other words, there must be no individual that belongs to both groups.
- Equality of variances: The population variances of the two groups must not be statistically different.
- If variances are equal, we say we have homoskedasticity
- If variances are not equal, we say we have heteroskedasticity
- We can test this!
9.5 Levene’s test for equality of variances
We test the following:
- Null hypothesis: population variances are equal
- Alternative hypothesis: population variances are not equal
Interpretation
As before, we will look at the p-value:
- \(p \leq \alpha\) we reject the null
- \(p > \alpha\) we fail to reject the null
Conclusion
If we reject the null, then variances are not equal. Therefore, our assumption (3) does not hold.
9.6 Interpretation of the one-way ANOVA test.
Once again, we look at the p-value:
- \(p \leq \alpha\) we reject the null
- \(p > \alpha\) we fail to reject the null
Conclusion
If we reject the null, we conclude that the population means of the different groups are not equal. In other words, at least one of the population means is different.
We do further analysis to identify which one. This analysis does t-tests for all possible pairs of groups. We can easily do this using SPSS.
9.7 Exercise
First, I will illustrate using the variable race in the health data.
Using the “health_data.sav” do the necessary procedures to check if the there is a statistically significant difference in bmi (body mass index) across regions.
- What is your null hypothesis?
- What is the alternative hypothesis?
- What is your alpha?
- Perform and interpret a levene’s test.
- Do a one-way ANOVA test. Interpret your p-value.
- What conclusions do you reach after a post-hoc test?