Chapter 14 ANOVA
In the last chapter we covered 1 and two sample hypothesis tests. In these tests, you are either comparing 1 group to a hypothesized value, or comparing the relationship between two groups (either their means or their correlation). In this chapter, we’ll cover how to analyse more complex experimental designs with ANOVAs.
When do you conduct an ANOVA? You conduct an ANOVA when you are testing the effect of one or more nominal (aka factor) independent variable(s) on a numerical dependent variable. A nominal (factor) variable is one that contains a finite number of categories with no inherent order. Gender, profession, experimental conditions, and Justin Bieber albums are good examples of factors (not necessarily of good music). If you only include one independent variable, this is called a One-way ANOVA. If you include two independent variables, this is called a Two-way ANOVA. If you include three independent variables it is called a Menage a trois `NOVA.
Ok maybe it’s not yet, but we repeat it enough it will be and we can change the world.
For example, let’s say you want to test how well each of three different cleaning fluids are at getting poop off of your poop deck.To test this, you could do the following: over the course of 300 cleaning days, you clean different areas of the deck with the three different cleaners. You then record how long it takes for each cleaner to clean its portion of the deck. At the same time, you could also measure how well the cleaner is cleaning two different types of poop that typically show up on your deck: shark and parrot. Here, your independent variables cleaner and type are factors, and your dependent variable time is numeric.
Thankfully, this experiment has already been conducted. The data are recorded in a dataframe called poopdeck
in the yarrr package. Here’s how the first few rows of the data look:
head(poopdeck)
## day cleaner type time int.fit me.fit
## 1 1 a parrot 47 46 54
## 2 1 b parrot 55 54 54
## 3 1 c parrot 64 56 47
## 4 1 a shark 101 86 78
## 5 1 b shark 76 77 77
## 6 1 c shark 63 62 71
We can visualize the poopdeck data using (of course) a pirate plot:
pirateplot(formula = time ~ cleaner + type,
data = poopdeck,
ylim = c(0, 150),
xlab = "Cleaner",
ylab = "Cleaning Time (minutes)",
main = "poopdeck data",
back.col = gray(.97),
cap.beans = TRUE,
theme = 2)
Given this data, we can use ANOVAs to answer four separate questions:
Question | Analysis | Formula |
---|---|---|
Is there a difference between the different cleaners on cleaning time (ignoring poop type)? | One way ANOVA | time ~ cleaner |
Is there a difference between the different poop types on cleaning time (ignoring which cleaner is used) | One-way ANOVA | time ~ type |
Is there a unique effect of the cleaner or poop types on cleaning time? | Two-way ANOVA | time ~ cleaner + type |
Does the effect of cleaner depend on the poop type? | Two-way ANOVA with interaction term |
time ~ cleaner * type |