# Topic 9 One-way ANOVA

Before, with the independent sample t-test, we wanted to test if the population mean of a group was equal to the population mean of another group.

Now, we are interested in testing the equality of population means for *multiple* groups. This is called the *one-way ANOVA test*.

We use this test to examine the means of one continuous variable (we call it the dependent variable) across different groups defined by a categorical variable (we call it the independent variable).

## 9.1 Details

If we are looking at only two groups (two categories), the one-way ANOVA and the independent sample t-test give us the same results.

Now, our null hypothesis is: *the population means for all groups are equal*. Therefore, if we reject the null of the one-way ANOVA test, this simply means that *at least* one of the population means is not equal. We do further analysis to identify which one.

The reasoning behind the one-way ANOVA is the same as the one behind the t-test. Now, however, our distribution is an F distribution, not a t distribution. Therefore, we look for the F critical on an F table, not a t table.

I will show you how to use the F-table in the following sections.

## 9.2 Formulas and calculations

Steps:

- Calculate the mean and sum of squares for each of the groups you are interested. This gives you the within-group variability.
- Use the means from each group to calculate between-group variability (see below)
- Use (1) and (2) in the F formula (below).

Formula for (3):

\[F = \frac{\text{between-group variability}}{\text{within-group variability}}\]

Formulas for (1): \[\text{within-group variability} = \frac{SS_1 + SS_2 + SS_3}{df_1 + df_2 + df_3}\]

Remember how to calculate \(SS_1 + SS_2 + SS_3\)? Just make the tables like we did last time (see below). It is important that you calculate the mean for each group.

\(x\) | f | fx | \(x - \overline{x}\) | \((x - \overline{x})^2\) | \(f(x - \overline{x})^2\) |
---|---|---|---|---|---|

… | … | … | … | … | … |

. | \(\sum f\) | \(\sum fx\) | \(\sum = SS_k\) |

Formulas for (2): \[\text{between-group variability} = \frac{SS_b}{df_b}\] \(df_b = k -1\) where \(k\) is the number of groups you are comparing.

But where does \(SS_b\) from?

Those values come from a new table (see below).

**Important**

In the table below, \(x\) is the mean you calculate for each group.\(\overline{x}\) is the mean for this table, which you calculate using the usual formula:

\[\overline{x} = \frac{\sum fx}{\sum f}\]

\(x\) (means) | n | f | fx | \(x - \overline{x}\) | \((x - \overline{x})^2\) | \(n(x - \overline{x})^2\) |
---|---|---|---|---|---|---|

mean (group 1) | n (group 1) | 1 (usually) | for g. 1 | for g. 1 | for g. 1 | for g. 1 |

mean (group 2) | n (group 2) | 1 (usually) | for g. 2 | for g. 2 | for g. 2 | for g. 2 |

mean (group 3) | n (group 3) | 1 (usually) | for g. 3 | for g. 3 | for g. 3 | for g. 3 |

mean (group k) | n (group k) | 1 (usually) | for g. k | for g. k | for g. k | for g. k |

\(\sum x\) | \(\sum f\) | \(\sum fx\) | \(\sum = SS_b\) |

## 9.3 Reading the F-table

The F-table tells us the critical values of F for a given \(\alpha\). The only tricky thing about this table is that you need to look for the correct degrees of freedom.

\[\frac{\text{numerator df}}{\text{denominator df}}= \frac{k-1}{N-k}\]

where \(N\) is the sum of all sample sizes \(n\).

\(k\) is the number of groups you are comparing.

Our interpretation is as follows:

- if the calculated \(F\) is higher than the critical value (\(F_{critical}\)), we
*reject the null hypothesis*. - if the calculated \(F\) is lower than the critical value (\(F_{critical}\)), we
*do not reject the null hypothesis*.

## 9.4 Assumptions

For the *one-way ANOVA test* to work, three assumptions need to hold:

**Normality:**The variable of interest should be normally distributed within each group.

- We usually do not perform mathematical tests to check for normality.
- Looking at measures of central tendency and histograms is probably the best way to do so.

**Independence:**There must be no relationship between the variables in each group.

- This is not simple to test.
- Usually, we assume it holds if each group is composed of observations from different units. In other words, there must be no individual that belongs to both groups.

**Equality of variances:**The population variances of the two groups must not be statistically different.

- If variances are equal, we say we have
*homoskedasticity* - If variances are not equal, we say we have
*heteroskedasticity* - We can test this!

## 9.5 Levene’s test for equality of variances

We test the following:

- Null hypothesis: population variances are equal
- Alternative hypothesis: population variances are not equal

**Interpretation**

As before, we will look at the p-value:

- \(p \leq \alpha\) we reject the null
- \(p > \alpha\) we fail to reject the null

**Conclusion**

If we reject the null, then variances are *not* equal. Therefore, our assumption (3) *does not hold*.

## 9.6 Interpretation of the one-way ANOVA test.

Once again, we look at the p-value:

- \(p \leq \alpha\) we reject the null
- \(p > \alpha\) we fail to reject the null

**Conclusion**

If we reject the null, we conclude that the population means of the different groups are *not* equal. In other words, *at least* one of the population means is different.

We do further analysis to identify which one. This analysis does t-tests for all possible pairs of groups. We can easily do this using SPSS.

## 9.7 Exercise

First, I will illustrate using the variable race in the health data.

Using the “health_data.sav” do the necessary procedures to check if the there is a statistically significant difference in bmi (body mass index) across regions.

- What is your null hypothesis?
- What is the alternative hypothesis?
- What is your alpha?
- Perform and interpret a levene’s test.
- Do a one-way ANOVA test. Interpret your p-value.
- What conclusions do you reach after a post-hoc test?