# 15 One-Way ANOVA Test in R

## 15.1 The one way ANOVA in R

The one-way analysis of variance (ANOVA), also known as one-factor ANOVA or single factor ANOVA, is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups based on a single grouping variable (also called factor variable).

One might ask why we need a new test if we could just use the t-test to test all possible pairs of means. First of all, if you had \(k\) groups, that would mean you have to run \(\frac{k(k-1)}{2}\) different tests. Second, as we increase the number of tests we run we increase the risk of finding a difference by chance alone.

ANOVA is based on analyzing the ratio of the variance between samples and the variance within the samples. \[F=\frac{ \text{variance between samples}}{\text{variance within samples}}\] This statistic follows an F-distribution with numerator degree of freedom = k-1 and denominator degree of freedom = N-k where k = number of groups (4 in the example below) and N = total number of samples in all groups combined (20 in the example). This is what the density function of a F-distribution with k-1 = 3 and N-k = 16 degrees of freedom looks like:

Notation:

\(\bar{\bar{x}}\) = mean of all sample values combined

k = number of groups you have

\(n_i\) = number of values in group i

\(\overline{x_i}\) = mean of the values in group i

\(s_i^2\) = variance of the values in group i

We define: \[ \small MSB = \text{total weighted sum of squares difference between sample means and overall mean} \] \[ \small = \text{variance between} = \frac{\sum_{i=1}^{k} n_i (\overline{x}_i - \bar{\bar{x}})^2}{k-1}\] and

\[\small MSW = \text{weighted average variance within samples} \] \[ \small = \text{variance within} = \frac{\sum_{i=1}^{k} (n_i-1) s_i^2}{\sum_{i=1}^{k} (n_i-1)} = \frac{\sum_{i=1}^k \sum_{j=1}^{n_i} (x_{i,j}-\overline{x_i})^2}{N-k}\] \(\small MSB\) and \(\small MSW\) stand for mean square between and mean square within. The test statistic \(F\) is then computed as \[ \small F=\frac{MSB}{MSW}.\]

Assumptions:

The samples are simple random samples.

The samples independently from each other.

The data of each factor level are normally distributed. (This is a loose requirement, the test work reasonably well unless the population is very far from normal. We should really check this though.)

These normal populations have a common variance. (This is also a loose requirement. It has been shown that, as long as the sample sizes are equal or near equal, the smallest and largest of the variances can vary by a factor up to nine, though other source set a threshold of a factor of two. We should really check this,too)

The different samples are from populations that are categorized in only one way.

ANOVA test hypotheses:

\(H_o\): The means of all the different groups are the same

\(H_A\): At least one sample mean is not equal to the others.

A large test-statistic F would lead us to reject the assumption that all the means are the same.

**Example**: Potato yields per acre can vary between 16 and 28 tons per acre. Assume a farmer tests different management strategies and grows potatoes under four different treatments: fertilizer, irrigation, both, or none. Here I enter the data from 20 acres:

```
#entering the data
yield <- c(19,16,19,19,18,21,20,25,17,22,25,21,24,28,22,24,16,18,18,22)
treatment <- rep("none",5)
treatment[6:10] <- "fertilized"
treatment[11:15] <- "irrigated"
treatment[16:20] <- "irrigated_fertilized"
potato.data <- data.frame(yield, treatment)
```

I want to know if there is a difference in mean yield between the groups. We use the `dplyr`

library to find the mean yield for each treatment.

```
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.3.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
potato.data %>%
group_by(by=treatment) %>%
summarize(av_yield_per_acre = mean(yield))
#> # A tibble: 4 × 2
#> by av_yield_per_acre
#> <chr> <dbl>
#> 1 fertilized 21
#> 2 irrigated 24
#> 3 irrigated_fertilized 19.6
#> 4 none 18.2
```

It sure looks like there is a difference in mean yield, so let’s run a formal test. Here, the groups are of the same size, so the formulas for the variances simplify. We have

\(n_1 = n_2 = n_3 = n_4 = 5\)

\(k=4\)

so \[MSB = \frac{\sum_{i=1}^{k} n_i (\overline{x}_i - \overline{\overline{x}})^2}{k-1}=5 \cdot \frac{\sum_{i=1}^{4} (\overline{x}_i - \overline{\overline{x}})^2}{3} = 5 \cdot \text{variance of the }\overline{x_i} \]
and

\[MSW = \frac{\sum_{i=1}^{k} (n_i-1) s_i^2}{\sum_{i=1}^{k} (n_i-1)} = 4 \cdot \frac{\sum_{i=1}^{4} s_i^2}{\sum_{i=1}^{4} 4} = \frac{ (s_1^2 + s_2^2 + s_3^2 + 2_4^2)}{4}\]

```
v1 <- var(potato.data$yield[which(potato.data$treatment=="none")])
v2 <- var(potato.data$yield[which(potato.data$treatment=="irrigated_fertilized")])
v3 <- var(potato.data$yield[which(potato.data$treatment=="fertilized")])
v4 <- var(potato.data$yield[which(potato.data$treatment=="irrigated")])
m1 <- mean(potato.data$yield[which(potato.data$treatment=="none")])
m2 <- mean(potato.data$yield[which(potato.data$treatment=="irrigated_fertilized")])
m3 <- mean(potato.data$yield[which(potato.data$treatment=="fertilized")])
m4 <- mean(potato.data$yield[which(potato.data$treatment=="irrigated")])
variance.between <- var(c(m1, m2, m3, m4))*(5)
variance.between
#> [1] 30.73333
variance.within <- (v1+v2+v3+v4)/4
variance.within
#> [1] 7.125
F.stat <- variance.between/variance.within
F.stat
#> [1] 4.31345
p.value <- 1-pf(F.stat,3,16)
p.value
#> [1] 0.02075182
```

Here the p-value is small-ish, less than 5%, so we reject the claim that all means are equal. At least one mean is different.

The next chunk of code runs the built-in ANOVA function. Note that we first have to make sure the data is in two columns, one for x and one for y. We already did that when we created the data frame `potato.data`

above. I am saving the results of our ANOVA as ‘potato.aov’.

```
potato.aov <- aov(potato.data$yield~potato.data$treatment)
summary(potato.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> potato.data$treatment 3 92.2 30.733 4.313 0.0208 *
#> Residuals 16 114.0 7.125
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Note that the ANOVA test only tells us that there are some means that are different, not which one(s). We *can* however look at the data table and make an educated guess. Here, it sure seems that irrigated fields do better.

## 15.2 TUKEY HSD test

*I*f the ANOVA test shows that there are statistically significant difference between means, we can use the TUKEY HSD (Tukey Honest Significant Differences) test to find them. TukeyHSD is a *post-hoc* test, performed *after* ANOVA has confirmed that we have a significant difference between means. TukeyHSD basicaly performs multiple pairwise-comparison between the means of groups. Making multiple comparisons leads to an increased chance of making a false discovery, i.e. rejecting a null hypothesis that should not have been rejected (type 1 error), so that is why we should use ANOVA first.

The function TukeyHD() takes the fitted ANOVA as an argument.

```
TukeyHSD(potato.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = potato.data$yield ~ potato.data$treatment)
#>
#> $`potato.data$treatment`
#> diff lwr upr
#> irrigated-fertilized 3.0 -1.829957 7.8299574
#> irrigated_fertilized-fertilized -1.4 -6.229957 3.4299574
#> none-fertilized -2.8 -7.629957 2.0299574
#> irrigated_fertilized-irrigated -4.4 -9.229957 0.4299574
#> none-irrigated -5.8 -10.629957 -0.9700426
#> none-irrigated_fertilized -1.4 -6.229957 3.4299574
#> p adj
#> irrigated-fertilized 0.3194523
#> irrigated_fertilized-fertilized 0.8398187
#> none-fertilized 0.3761527
#> irrigated_fertilized-irrigated 0.0807519
#> none-irrigated 0.0161016
#> none-irrigated_fertilized 0.8398187
```

You can see here that there only is a *statistically* significant differences between irrigated and non-irrigated fields. Note how the 95% confidence for the differences between the treatment methods indicated intervals support this result. However, whether a 5.8 ton increase is *significant* (vs statistically significant) needs an analysis of cost of irrigation vs improved yield.

Another example, the cabbage data from the MASS library. I want to see if there is a statistically significant difference in the mean Vitamin C content or mean head weight depending on planting date. The steps are

Eyeball to see if there might be a difference

Run ANOVA to confirm that there is indeed a statistically significant difference (or show that there isn’t)

If ANOVA comes back as significant, run Tukey to find where the difference might be.

*Remember the difference between statistically significant and significant!!*

First we read in the data and have a look at the variables we are dealing with.

```
library(MASS)
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#> select
str(cabbages)
#> 'data.frame': 60 obs. of 4 variables:
#> $ Cult : Factor w/ 2 levels "c39","c52": 1 1 1 1 1 1 1 1 1 1 ...
#> $ Date : Factor w/ 3 levels "d16","d20","d21": 1 1 1 1 1 1 1 1 1 1 ...
#> $ HeadWt: num 2.5 2.2 3.1 4.3 2.5 4.3 3.8 4.3 1.7 3.1 ...
#> $ VitC : int 51 55 45 42 53 50 50 52 56 49 ...
```

Step 1:

```
cabbages %>%
group_by(by=Date) %>%
summarize(weight = mean(HeadWt), vitamin = mean(VitC))
#> # A tibble: 3 × 3
#> by weight vitamin
#> <fct> <dbl> <dbl>
#> 1 d16 2.72 56.4
#> 2 d20 2.96 54.2
#> 3 d21 2.10 63.3
```

It looks like there might be differences in mean.

Step 2: Run ANOVA

First for the head weight

```
weight.aov <- aov(cabbages$HeadWt~cabbages$Date)
summary(weight.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> cabbages$Date 2 7.71 3.853 5.745 0.00533 **
#> Residuals 57 38.23 0.671
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Next for the vitamin C content

```
vit.aov <- aov(cabbages$VitC ~ cabbages$Date)
summary(vit.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> cabbages$Date 2 909 454.6 5.05 0.00957 **
#> Residuals 57 5132 90.0
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Looking at the p-values, we can say that the differences are statistically significant.

Step 3: Tukey to find the differences

For weight:

```
TukeyHSD(weight.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = cabbages$HeadWt ~ cabbages$Date)
#>
#> $`cabbages$Date`
#> diff lwr upr p adj
#> d20-d16 0.235 -0.3882202 0.858220165 0.6378918
#> d21-d16 -0.615 -1.2382202 0.008220165 0.0538386
#> d21-d20 -0.850 -1.4732202 -0.226779835 0.0049415
TukeyHSD(weight.aov, conf.level = 0.90)
#> Tukey multiple comparisons of means
#> 90% family-wise confidence level
#>
#> Fit: aov(formula = cabbages$HeadWt ~ cabbages$Date)
#>
#> $`cabbages$Date`
#> diff lwr upr p adj
#> d20-d16 0.235 -0.3074101 0.77741009 0.6378918
#> d21-d16 -0.615 -1.1574101 -0.07258991 0.0538386
#> d21-d20 -0.850 -1.3924101 -0.30758991 0.0049415
```

Only the difference between d21 and d20 is statistically significant at th 95% confidence level. At 90%, the difference between d21 and d16 is also significant.

For weight:

```
TukeyHSD(vit.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = cabbages$VitC ~ cabbages$Date)
#>
#> $`cabbages$Date`
#> diff lwr upr p adj
#> d20-d16 -2.25 -9.470345 4.970345 0.7349269
#> d21-d16 6.90 -0.320345 14.120345 0.0639292
#> d21-d20 9.15 1.929655 16.370345 0.0095834
TukeyHSD(vit.aov, conf.level = 0.90)
#> Tukey multiple comparisons of means
#> 90% family-wise confidence level
#>
#> Fit: aov(formula = cabbages$VitC ~ cabbages$Date)
#>
#> $`cabbages$Date`
#> diff lwr upr p adj
#> d20-d16 -2.25 -8.5341163 4.034116 0.7349269
#> d21-d16 6.90 0.6158837 13.184116 0.0639292
#> d21-d20 9.15 2.8658837 15.434116 0.0095834
```

Again, only the difference between d21 and d20 is statistically significant at th 95% confidence level. At 90%, the difference between d21 and d16 is also significant.

## 15.3 Assignment

Suppose there are six different dogs representing six different breeds (A, B, C, D, E, F) competing for best in show at a dog show, and a panel of 15 judges (J1 thru J15) is evaluating them on a scale from
0 to 7.

I want you to consider two questions:

1. Is there a difference in the mean average scores for the six different dogs?

2. Is there a difference between the mean scores each judge gives out?

Run the code chunk below to enter the data.

```
score <-
c(5,7,3,6,6,4,
4,2,6,5,4,3,
5,3,6,4,7,6,
5,6,0,2,1,6,
7,4,0,1,1,2,
7,7,0,2,6,3,
6,6,0,5,6,3,
4,6,1,5,4,1,
6,4,0,5,7,6,
7,7,0,6,6,2,
2,4,0,1,2,2,
5,7,4,6,7,4,
7,5,0,4,6,2,
4,5,0,4,3,7,
6,6,3,6,7,8)
dog <- rep(c("A", "B", "C","D","E","F"), 15)
judge <- rep(c("J1","J2","J3","J4","J5","J6","J7","J8","J9","J10,","J11,","J12,","J13","J14","J15"),rep(6,15))
dog.data <- data.frame(score, dog,judge)
```

Next, lets summarize the data. I am doing it for the dogs, you do it for the judges.

```
library(dplyr)
dog.data %>%
group_by(dog) %>%
summarize(avscore = mean(score))
#> # A tibble: 6 × 2
#> dog avscore
#> <chr> <dbl>
#> 1 A 5.33
#> 2 B 5.27
#> 3 C 1.53
#> 4 D 4.13
#> 5 E 4.87
#> 6 F 3.93
```

Looking at the output, it sure looks like dog C is unpopular.

Write \(\small H_o\) and \(\small H_a\) here:

Then run the ANOVA test and interpret. Make sure you use plain language. Finally, if the ANOVA finds a statistically significant difference, run a Tukey test as well and interpret.

Finally, discuss if this is an appropriate use of ANOVA and TUKEY.