The general framework of frequentist inference is surprisingly flexible. As it turns out, there are a lot of different tests you can do, in order to work with different kinds of data and answer different questions.
Some examples you might have encountered in your intro days include proportions, means, the difference between two proportions, the difference of two means, a regression slope, and so on.
Now that “difference of two means” one sounds like it could come in handy. After all, as experimenters, we’re often interested in comparing some variable for two groups – the yield of tomato plants given fertilizer A or B, say, or whether a coffee shop does better business when they put a picture on their sign or advertise a special, or whether baby chicks grow bigger if you feed them diet 1 or diet 2.
But hold on: why two groups? That’s so limiting! There are more than two kinds of fertilizer in the world. There are more than two ways to feed a chicken. Maybe we’d like to compare several different levels of some grouping factor.
The question that we’d ask, then, is: are any of these groups different? Does fertilizer matter? Does diet matter?
That’s what we do in ANOVA. It’s a way of comparing multiple groups, or in other words, multiple levels of a grouping factor, in order to ask whether that factor has any effect overall.
Here’s the fundamental idea of ANOVA, in words: Observations vary. The question is, can we explain why? Well, some of that variation is just due to randomness. But maybe, some of it is because the observations come from different groups. This interest in variation is where the name ANOVA comes from, by the way: it stands for ANalysis Of VAriance.
Consider the baby chicks example. Different baby chicks weigh different amounts. To some extent, that’s just random: chickens vary. But possibly, some of it is because we gave the different chicks different diets.
If it helps, you can think of this in terms of signal and noise. We’re really interested in seeing if the chickens’ diet is reflected in their weight – that’s the signal. But there’s also that random variation between individual chickens, which is just noise.
The other set of terms we use here is between and within. We want to know if there’s a big difference between groups – the chickens who get each type of diet. But what constitutes a big difference in terms of baby chick weights? To get a sense of scale, we look at the variation within each group. If chicks who get the same diet have different weights, well, that’s got nothing to do with our treatment. It’s simply a reflection of the natural, random variation between individual chicks.
If we see that the difference between chicks on different diets is a lot bigger than this natural, random, individual variation – well, that’s when we start to think that the diet matters!
Notice how generally I’m talking about diet here. I’m not asking if diet 1 is better than diet 3; I’m asking if there is, overall, a difference between diets. ANOVA has a very general null hypothesis: the factor I’m looking at, overall, doesn’t matter. There’s no difference between any of the groups.
Okay, so suppose we are doing this experiment with the baby chicks.
We feed each little chicken one of four different diets, and we record how much it weighs at, say, 20 days old. I go look at the data and here’s what I see:
= ChickWeight %>% filter(Time == 20) %>% mean_weights group_by(Diet) %>% summarize(avgWeight = mean(weight)) mean_weights
## # A tibble: 4 x 2 ## Diet avgWeight ## <fct> <dbl> ## 1 1 170. ## 2 2 206. ## 3 3 259. ## 4 4 234.
Nifty! Looks like the chicks on the different diets have different average weight.
But, of course, my next question is: how different? The average weights on each diet seem to be 20 to 50 grams different. 20 grams really isn’t much. But then, these are tiny fluffy baby chicks; maybe 20 grams is a lot for them.
Well, let’s do what statisticians do, and draw a picture. Suppose I made a side-by-side boxplot of weights and it looked like this:
Ohoho! Pretty promising. Looks like the diet really matters!
But what if the scatterplot looked like this:
Mmm. Now I’m not so sure. I wouldn’t be confident in saying that there’s really any difference here based on diet.
And yet: in both of those plots, the group means were exactly the same! In the first plot, it looked like diet mattered because the difference between the groups was large compared to the spread within each group. In the second plot, the variation within each group swamped the differences between them.
That’s what ANOVA is all about: deciding if the differences between groups are large compared to the variation within them.
Response moment: If you couldn’t do ANOVA – which I guess you can’t yet – and you wanted to know whether diet mattered, what test(s) might you do instead? Can you think of any possible drawbacks or pitfalls of doing that?