## 2.6 Two-way ANOVA

This is another long one, so buckle up, and pause and stretch when you need to!

### 2.6.1 Introduction: context

We’ve seen that ANOVA can be helpful when we have some treatment factor with multiple levels – it allows us to think about the effect of the factor overall, rather than having to do everything for one level at a time.

But hey, why stop at one factor? In most cases, there are lots of factors that might affect our response variable, and we might well want to test more than one of them at a time. In that case, we can do ANOVA with multiple factors. For today, we’ll content ourselves with just two factors and a response; this is called *two-way* ANOVA.

It’s important to pause here and recall a concept that I hope you’ve encountered before. When you have a model with multiple predictors in it – regression, ANOVA, whatever – those predictors don’t exist in a vacuum. Any data point you observe that has level A of predictor 1 *also* has some level of predictor 2. Back in regression, you (again, I hope) learned to chant the phrase “after accounting for other predictors.”

Suppose I have a model to, say, predict people’s blood pressure using their age and BMI. Then the coefficient of “age” in this model describes the relationship between age and blood pressure *after accounting for BMI*. Or we might say “for people of a given BMI,” or “where BMI is held constant.” Likewise, the coefficient of BMI describes the relationship between BMI and blood pressure *after accounting for age*.

In ANOVA, a similar idea applies. When we put multiple predictors into our model, we gain the ability to examine the effects of one factor *after accounting for* the other ones.

One of the really common situations where this comes up is when you have a treatment factor and a blocking factor. Say you are examining the growth of baby chicks on different diets, but you have two breeds of chicken. So you use the breed as a blocking factor, randomizing within each block, so that some chickens of each type get each diet. Then you can do two-way ANOVA to examine the effect of diet on growth *after* taking the type of chicken into account.

It’s also quite possible, though, to do this when both of your predictors are considered interesting and controllable – that is, they’re both treatment factors. That’s the case we’ll focus on here.

### 2.6.2 Example: guinea pigs

We’ll use an example dataset about guinea pigs – not the figure of speech, I mean actual literal guinea pigs.

Awwwww.

In this case, we’re investigating whether feeding guinea pigs vitamin C helps enhance their tooth growth. The response will be the length of the teeth (well, technically, the length of certain cells responsible for tooth growth), and we’ll use two treatment factors: the dosage level of vitamin C, and the way it’s administered – via orange juice or an ascorbic acid supplement.

Here’s a quick look at a few of the cases:

```
= ToothGrowth %>%
pig_dat mutate(dose = as.factor(dose))
# treat dosage as a categorical factor
%>% slice_sample(n = 6) pig_dat
```

```
## len supp dose
## 1 10.0 OJ 0.5
## 2 5.8 VC 0.5
## 3 21.2 OJ 1
## 4 15.5 VC 1
## 5 25.8 OJ 1
## 6 15.2 OJ 0.5
```

### 2.6.3 Conditions and cautions

Before we start, a word about conditions. Two-way ANOVA operates under the same assumptions as one-way ANOVA: independent errors, normal (or nearly-normal) errors, an absence of weird behaviors or extreme outliers, and roughly equal variance of the errors for each group. As before, independence of errors is something we have to think about using the context and practical considerations. (Are some of these guinea pigs from the same litter? Were they all kept under the same conditions apart from their vitamin C supplements? And so on.)

We’ll want to check the normality and equal-variance conditions by examining the residuals after we fit the model, since the residuals are our best estimates of the true errors.

In models with multiple factors, there’s another condition we need to think about: *additivity*. As we’ll see shortly, the basic two-way ANOVA model assumes that there’s some effect of dosage on tooth growth, and some effect of supplement type on tooth growth, but they act independently of each other. To predict a guinea pig’s tooth growth, we can just add together the effect of its dosage and the effect of its supplement type. According to this model, we don’t need to know which supplement the guinea pig is getting in order to know the effect of dosage; dosage has the same effect whether it comes by OJ or ascorbic acid.

This may not, in fact, be true: there may be an *interaction* between the predictors. We won’t go into this issue in detail in this lecture, but it’s important to keep it in mind. Later on, you’ll do more thinking about when you should include interaction effects in your models.

### 2.6.4 Visuals

As the statisticians do, let’s start by making some pictures. Here’s a boxplot of tooth length (called `len`

) grouped by vitamin C dosage:

```
%>%
pig_dat ggplot() +
geom_boxplot(aes(x = dose, y = len))
```

And here’s one of tooth length grouped by supplement type:

```
%>%
pig_dat ggplot() +
geom_boxplot(aes(x = supp, y = len))
```

It looks like both of these factors may have some effect on the response. For the most information, though, let’s look at both factors at the same time:

```
%>%
pig_dat ggplot() +
geom_boxplot(aes(x = dose, y = len, color = supp))
```

Now this is intriguing. It still looks like there’s a clear effect of dose, and also an effect of supplement type…sort of. Supplement type definitely seems to matter at lower doses, but it *doesn’t* seem to matter if the dosage level is 2.

This is in fact an example of an *interaction* effect. To tell you whether supplement type matters, I have to know what dosage level you’re talking about. Again, this is mostly a topic for another time – just bear in mind as we do this analysis that we are not actually meeting all the conditions. But, for now, we’ll forge ahead.

Finally, an important point is that our dataset is what’s called *balanced*. There are the same number of guinea pigs on each combination of dosage and supplement type. This does not have to be true to run an experiment, as we’ll see later; but the math is a looot simpler if it is true.

Now then. We’ve observed using the time-honored Eyeball Test that dosage and supplement type seem to matter, but how certain are we? Time for some math!

### 2.6.5 Two-way ANOVA equation

In one-way ANOVA, we used this equation for the theoretical, True model:
\[y_{ij} = \mu + \tau_i + \varepsilon_{ij}\]
Here \(\mu\) represented the grand or overall mean, \(\tau_i\) represented an *adjustment* to this mean for group \(i\), and \(\varepsilon_{ij}\) was the error for this specific observation.

Now we generalize this so that instead of one treatment factor there are two.

In order to specify a particular case, or in this case guinea pig, we now need to specify which level of dosage it gets, which level of supplement it gets, and which replicate it is within that. So \(y_{ijk}\) would be the \(k\)th guinea pig getting dosage \(i\) and supplement \(j\).

“Hold on a sec,” you may say. “I remember looking at R output for stuff like this, and it chooses one of the levels as a baseline. All the other effects are *relative to that*.”

Well, yes, that’s a thing. You can choose one of the levels of each factor to be the baseline. In that case, \(\mu\) is actually the mean for this baseline combination of factor levels, and the \(\alpha_i\)’s and \(\beta_j\)’s are adjustments to that baseline for all the other groups.

Or you may say: “I’ve seen an equation like this before, but there was no intercept.”

Also a thing!

This all relates to the problem that if you have an overall mean *and* a treatment adjustment for every level, your model is not what we call *identifiable*. There are actually multiple sets of coefficients that work. For example, I could just add 5 to the intercept coefficient, and subtract 5 from all the \(\alpha\)’s, and the model would give the same predictions!

If you want to have a model with an intercept *and* an adjustment for every level of the treatments – and I like to, because I think it’s easier to read – you actually also have to introduce a constraint: all the treatment effects must sum to 0. So:

\[\sum_i \alpha_i = 0 \quad\mbox{and}\quad \sum_j \beta_j = 0\]

For the purposes of ANOVA inference, it doesn’t actually matter which of these methods you use. But it’s good to know which one is in play when you’re looking at a particular model, so that you can interpret the coefficients correctly.

Then the model becomes: \[y_{ijk} = \mu + \alpha_i + \beta_j + \varepsilon_{ijk}\]

Notice how we’ve replaced the \(\tau\) with an \(\alpha\) and a \(\beta\). We commonly refer to factors “A and B,” so we use the corresponding Greek letters, \(\alpha\) and \(\beta\), to represent their effects. I realize that we were already using \(\alpha\) and \(\beta\) for completely different things, but oh well, there’s only so many letters. Hopefully it will be clear from context what we’re talking about.

Everybody’s got a different opinion on what to call the number of replicates. Some use \(k\), some use \(n\). I like \(r\) because it stands for “replicate” and because we weren’t previously using it for anything else. Except correlation, I guess, but the context is usually going to make it clear which thing I’m talking about.

Let’s also say that factor A (dosage) has \(a\) levels, so \(i=1,\dots,a\). Likewise factor B (supplement type) has \(b\) levels. And there are \(r\) replicates on each combination of levels, so \(k=1,\dots,r\).

Now, what about the sample version? Back in the day, we had: \[ y_{ij} = \overline{\overline{y}} + (\overline{y}_i - \overline{\overline{y}}) + (y_{ij} - \overline{y}_i) \]

Because it’s going to get silly if I keep adding more bars on top of things, I’m going to introduce a bit of new notation. When I want to take the mean *over a particular subscript*, I put a bar on top of \(y\) and replace that subscript with a dot. So \(\overline{y}_{ij.}\) is the average tooth length for all guinea pigs with dosage \(i\) and supplement \(j\).

In this notation, the mean for all guinea pigs on dosage \(i\) is \(\overline{y}_{i..}\) and the mean for all guinea pigs on treatment \(j\) is \(\overline{y}_{.j.}\). And the grand mean of them all is \(\overline{y}_{...}\)!

So the estimated equation becomes: \[ y_{ijk} = \overline{y}_{...} + (\overline{y}_{i..} - \overline{y}_{...}) + (\overline{y}_{.j.} - \overline{y}_{...}) + (y_{ijk} - \overline{y}_{ij.}) \]

This equation will allow us to obtain, as before, sums of squares. Also as before, the *total* sum of squares, SSTot, can be broken up into sums of squares due to each term on the right-hand side of the equation: sums of squares for A, for B, and for error.

Let’s start with the treatment effect for factor A. The sum of squares for A, *SSA*, is:

\[\sum_{ijk} (\overline{y}_{i..} - \overline{y}_{...})^2\] …which is actually just \[\sum_{i} br(\overline{y}_{i..} - \overline{y}_{...})^2\]

And so on for the other terms.

Just as before, each term has an associated number of degrees of freedom, representing the number of independent quantities we’re estimating. We divide each sum of squares by its associated degrees of freedom to obtain mean squares.

To be more concise, let’s use an ANOVA table to show all the pieces:Source | df | Sum of squares | Mean squares |
---|---|---|---|

A | \(a-1\) | \(SSA\) | \(MSA = SSA/(a-1)\) |

B | \(b-1\) | \(SSB\) | \(MSB = SSB/(b-1)\) |

Error | \(abr - (a-1) - (b-1) - 1\) | \(SSE\) | \(MSE = SSE/dfE\) |

Note the degrees of freedom: \(a\), the number of levels, minus 1 for factor A; \(b-1\) for factor B. For the error, \(abr\), the total number of observations, minus the \(a-1\) that went to A, minus the \(b-1\) that went to B, minus 1 for the grand mean.

### 2.6.6 Inference

As in one-way ANOVA, our mean squares can be thought of as estimates of variation. For example, MSA represents the variation *between* levels of factor A – different dosages. And MSB represents the variation *between* levels of factor B, supplement type. Finally, MSE represents variation of individuals *within* the same combination of treatment levels.

So just as before, if we want to know whether, say, dosage matters, we ask whether the variation *between* dosage levels is noticeably bigger than the variation among individual guinea pigs *within* a dosage level. We do this by looking at the ratio of the relevant mean squares, and comparing this to an \(F\) distribution with the appropriate degrees of freedom. It’s just that now we add “…after accounting for the type of supplement.”

Let’s add a column to the table:

Source | df | Sum of squares | Mean squares | \(F\) stat |
---|---|---|---|---|

A | \(a-1\) | \(SSA\) | \(MSA = SSA/(a-1)\) | \(MSA/MSE\) |

B | \(b-1\) | \(SSB\) | \(MSB = SSB/(b-1)\) | \(MSB/MSE\) |

Error | \(abr - (a-1) - (b-1) - 1\) | \(SSE\) | \(MSE = SSE/dfE\) |

Consider the null hypotheses here. For factor A, dosage, the null would be that dosage doesn’t matter overall – the average for each dosage level is, at the population level, the same:

\[H_0:\alpha_i=0\quad\mbox{for all}~i\]

And the null hypothesis about factor B, supplement type, would be: \[H_0:\beta_j=0\quad\mbox{for all}~j\]

Let’s see what R has to say about it!

```
= aov(len ~ dose + supp, data = pig_dat)
pig_aov %>% summary() pig_aov
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 2426.4 1213.2 82.81 < 2e-16 ***
## supp 1 205.4 205.4 14.02 0.000429 ***
## Residuals 56 820.4 14.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Looks like, as we suspected from the Eyeball Test, both these factors are significant! We are very confident that vitamin C dosage has an effect on tooth growth, even after taking into account how the vitamin C is administered. And we are *also* very confident that the supplement type has an effect, even after taking the dosage into account.

All of this math, again, relies on certain conditions. We can check some of them quickly using the built-in plots associated with `aov`

objects:

`%>% plot(1:2) pig_aov `

I don’t see any major deviations from normality or huge differences in variance in each group, so I’d say we’re okay.

There were some other assumptions underlying the math, though, remember! I assumed that the model was *additive* – no interaction between dosage and supplement type – and I also relied on the fact that my dataset was *balanced* – the same number of replicates in each combination of dosage level and supplement type. We’ll explore what to do when these *aren’t* true…another time.

**Response moment:** Degrees of freedom time again! Keeping in mind that I observed the same number of guinea pigs for each combination of dosage and supplement type, what was \(r\), the number of replicates in each combination?