Why experiment? Experimental design textbooks tend to get really into this question, with long chapters about philosophy and human learning and whatnot. But the short answer is: to find out something about how the world works. Which is a pretty good goal.
Of course, as is usually the case in statistics (and life), it’s a little more complicated than the short answer. One of the themes we’ll revisit a lot during this course is the idea of research goals – what it is that you want to learn from an experiment, given the context.
For example, you might want to discover any of the following:
- Which factors affect some variable that interests you
- Which ones don’t affect it much
- The actual relationship between two or more variables
- How you can minimize variation in some outcome variable or process
…and so on. Those goals will ultimately determine the kinds of experiments and analysis you do.
You may distantly recall from intro stats the distinction between observational data and experimental data. To review:
Observational data comes from, well, your observation of the world. You don’t change anything: you just look at what’s happening. Now, you do get to choose what to look at – that’s where sampling comes in. But once you’ve chosen your sample, all you do is look. For example, you might choose a sample of adorable baby birds and observe how old they are when they learn to fly.
Experimental data is produced when you actually change or determine something about the environment, and then see what happens. For example, you might decide to feed some of the baby birds on seeds, and others on worms. You are determining something here – the diet of each bird. If you just let the birds eat whatever they wanted and recorded whether they had seeds or worms, that would be observational.
“Correlation does not imply causation” is probably the #2 statistician catchphrase, behind “given these assumptions” and just ahead of “I’ve found a problem with the data.” Also, if somehow you got through intro stats and nobody told you about tylervigen.com, go check it out now.
Another thing that I dearly hope you recall from intro stats is that beloved statistician catchphrase: CORRELATION DOES NOT IMPLY CAUSATION. When you work with observational data, you do not get to make any causal statements based on your results. You might observe that A and B are associated, but you can’t say that A causes B, or vice versa. One of the main attractions of experiments is that you do get to make causal statements. We’ll go into more about why in a little bit.
Okay, experiments are great. Why not just jump in – change some stuff and see what happens?
Well, experiments also tend to be expensive. They cost money and time and effort. There may be ethical considerations. (These are all reasons why people do observational studies instead!) If you’re going to do an experiment, you want to make sure it’s actually going to tell you those things you decided you wanted to discover.
Here is a cautionary tale that, I’m told, actually happened to a colleague of mine who does consulting. A credit card company wanted to know how to get people to sign up for their cards. They mailed out a whole bunch of offers – 50,000 of them – advertising credit cards with different terms. On some offers, the card had a low APR (that’s the exorbitant interest they charge you if you don’t pay off the card right away) and a low annual fee. On other offers, the card had a high APR and a high annual fee. So they were doing an experiment: they decided which offer each potential customer would receive. Then they sat back and observed the response rate – how many people signed up for each offer.
Response moment: What do you think happened? Pause for a minute and note down your thoughts.
Turns out, they got a very high response rate for the low-APR/low-fee offers, and a very low response rate for the high-APR/high-fee offers. Then they turned to my colleague and said: Okay, so what makes people sign up for credit cards?
There was no way to tell! Sure, one offer worked better than the other, but was it because of the lower APR, the lower fee, or both? Which was more important? Who knows??
This is an example of confounding, a concept we will revisit extensively – you can see an effect, but you don’t know which variable is actually driving it.
This company had spent months, plus the expense of a huge mail campaign, to discover that: people like good credit cards better.
So this is why you take the time to design an experiment in advance – and why there is a whole field of study, some parts of which we will wander through this semester, about the best ways to do so. It’s going to get more complicated than this credit card example, but ultimately, it all comes back to the same principle: how to make sure your experiment actually tells you what you want to know.