It is often said, by the people who say this kind of thing, that the three foundational principles of experimental design are randomization, replication, and blocking.
We’ve talked some about randomization and blocking, but what about replication?
Replication is the idea that you should try things more than once. To see why, consider the tomatoes again.
Suppose I grow one tomato plant with fertilizer A, obtaining 20 pounds of yield, and a second plant with fertilizer B, obtaining 22 pounds. So…great? Is fertilizer B better? I don’t really know, because I don’t know how much plant yields tend to vary. Maybe 2 pounds is just noise. Maybe it’s a huge difference. I have no way to estimate the typical variation in plant yield, so I don’t know if this observed difference is just the usual, or if it’s reflecting an effect of the fertilizer.
We’ll come back to this idea of being able to estimate variance in a bit.
So, okay, let’s up the budget and buy another tomato seedling. I’ll give this one fertilizer A as well. Now I have two replicates at level A of the fertilizer. That is, I have two experimental units that received the very same treatment. Yay!
In classical experimental design, it’s common to talk about replicating an entire experiment. That is, you end up with the same number of replicates for each combination of factor levels. In this case, I’d need two plants with fertilizer A, and two with B: now I have two replicates for each treatment. We’ll break this rule later, but it turns out to be mathematically handy in some ways.
There is a tricky point here which our textbook BHH does not really focus on, but which comes up disturbingly often in the real world. That problem is duplicates, also called repeated measurements or, most awesomely, pseudo-replicates.
The thing about replication is that your replicates have to be different, independent experimental units. The whole point of replication is to see the inherent variance between experimental units. If your replicates aren’t actually different units, then you are not going to be able to estimate this variation properly.
So for example, suppose I can’t afford four tomato seedlings. So I just use two, but I measure each plant’s yield twice. That’s kind of nice, I guess, in that it helps me see if there’s measurement error in my scale. But it doesn’t tell me anything about the variation between tomato plants! What I’ve done here is, I’ve obtained repeated measurements, but I don’t have any true replicates.
This is a problem that’s easy to spot in this example, but can be more subtle in the real world. For example, let’s say you’re investigating the right temperature to bake bread. You mix up a batch of dough, divide it into three loaves, and pop ’em in the oven at 350 degrees. Then you do the same thing – mix a batch of dough and divide it into three loaves – but this time you bake ’em at 400 degrees. Then you measure each loaf on the well-known International Fluffy Deliciousness Scale. So you have three measurements for each level of the temperature factor.
…But you don’t have three replicates. Those three loaves that were in at 400 degrees aren’t independent of each other; they’re pseudo-replicates, or duplicates. They can tell you about the variation among loaves made from the same batch of dough, but they can’t tell you about the variation between loaves of bread in general. If the 400-degree loaves and the 350-degree loaves come out different, you don’t know how much of that is due to the temperature, and how much of it is just typical variation between batches of dough.
Response moment: What might be a better way to run this experiment? (Think about the principles: randomization, replication, and blocking.)