## 4.6 Causation and Internal Validity Evidence

Medical researchers may be interested in showing that a drug helps improve people’s health (the cause of improvement is the drug), while educational researchers may be interested in showing a curricular innovation improves students’ learning (the curricular innovation causes improved learning).

To attribute a causal relationship, there are three criteria a researcher needs to establish:

• Association of the Cause and Effect: There needs to be a association between the cause and effect.
• Timing: The cause needs to happen BEFORE the effect.
• No Plausible Alternative Explanations: ALL other possible explanations for the effect need to be ruled out.

The third criterion can be quite difficult to meet. To rule out ALL other possible explanations for the effect, we want to compare the world with the cause applied to the world without the cause. In practice, we do this by comparing two different groups: a “treatment” group that gets the cause applied to them, and a “control” group that does not. To rule out alternative explanations, the groups need to be “identical” with respect to every possible characteristic (aside from the treatment) that could explain differences. This way the only characteristic that will be different is that the treatment group gets the treatment and the control group doesn’t. If there are differences in the outcome, then it must be attributable to the treatment, because the other possible explanations are ruled out.

So, the key is to make the control and treatment groups “identical” when you are forming them. One thing that makes this task (slightly) easier is that they don’t have to be exactly identical, only probabilistically equivalent. This means, for example, that if you were matching groups on age that you don’t need the two groups to have identical age distributions; they would only need to have roughly the same AVERAGE age. Here roughly means “the average ages should be the same within what we expect because of sampling error.”

Now we just need to create the groups so that they have, on average, the same characteristics … for EVERY POSSIBLE CHARCTERISTIC that could explain differences in the outcome.

It turns out that creating probabilistically equivalent groups is a really difficult problem. One method that works pretty well for doing this is to randomly assign participants to the groups. This works best when you have large sample sizes, but even with small sample sizes random assignment has the advantage of at least removing the systematic bias between the two groups (any differences are due to chance and will probably even out between the groups). As Wikipedia’s page on random assignment points out,

Random assignment of participants helps to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Thus, any differences between groups recorded at the end of the experiment can be more confidently attributed to the experimental procedures or treatment. … Random assignment does not guarantee that the groups are matched or equivalent. The groups may still differ on some preexisting attribute due to chance. The use of random assignment cannot eliminate this possibility, but it greatly reduces it.

We use the term internal validity to describe the degree to which cause-and-effect inferences are accurate and meaningful. Causal attribution is the goal for many researchers. Thus, by using random assignment we have a pretty high degree of evidence for internal validity; we have a much higher belief in causal inferences. Much like evidence used in a court of law, it is useful to think about validity evidence on a continuum. For example, a visualization of the internal validity evidence for a study that employed random assignment in the design might be:

The degree of internal validity evidence is high (in the upper-third). How high depends on other factors such as sample size.

### 4.6.1 Example: Does sleep deprivation cause an decrease in performance?

Let’s consider the criteria with respect to the [sleep deprivation study][Activity: Sleep deprivation] we explored in class.

#### 4.6.1.1 Association of cause and effect

First, we ask, Is there an association between the cause and the effect? In the sleep deprivation study, we would ask, “Is sleep deprivation associated with an decrease in performance?”

This is what a hypothesis test helps us answer! If the result is statistically significant, then we have an association between the cause and the effect. If the result is not statistically significant, then there is not sufficient evidence for an association between cause and effect.

In the case of the sleep deprivation experiment, the result was statistically significant, so we can say that sleep deprivation is associated with a decrease in performance.

#### 4.6.1.2 Timing

Second, we ask, Did the cause come before the effect? In the sleep deprivation study, the answer is yes. The participants were sleep deprived before their performance was tested. It may seem like this is a silly question to ask, but as the link above describes, it is not always so clear to establish the timing. Thus, it is important to consider this question any time we are interested in establishing causality.

#### 4.6.1.3 No plausible alternative explanations

Finally, we ask Are there any plausible alternative explanations for the observed effect? In the sleep deprivation study, we would ask, “Are there plausible alternative explanations for the observed difference between the groups, other than sleep deprivation?” Because this is a question about plausibility, human judgment comes into play. Researchers must make an argument about why there are no plausible alternatives. As described above, a strong study design can help to strengthen the argument.

At first, it may seem like there are a lot of plausible alternative explanations for the difference in performance. There are a lot of things that might affect someone’s performance on a visual task! Sleep deprivation is just one of them! For example, artists may be more adept at visual discrimination than other people. This is an example of a potential confounding variable. A confounding variable is a variable that might affect the results, other than the causal variable that we are interested in.

Here’s the thing though. We are not interested in figuring out why any particular person got the score that they did. Instead, we are interested in determining why one group was different from another group. In the sleep deprivation study, the participants were randomly assigned. This means that the there is no systematic difference between the groups, with respect to any confounding variables. Yes—artistic experience is a possible confounding variable, and it may be the reason why two people score differently. BUT: There is no systematic difference between the groups with respect to artistic experience, and so artistic experience is not a plausible explanation as to why the groups would be different. The same can be said for any possible confounding variable. Because the groups were randomly assigned, it is not plausible to say that the groups are different with respect to any confounding variable. Random assignment helps us rule out plausible alternatives.

#### 4.6.1.4 Making a causal claim

Now, let’s see about make a causal claim for the sleep deprivation study:

• Association: There is a statistically significant result, so the cause is associated with the effect
• Timing: The participants were sleep deprived before their performance was measured, so the cause came before the effect
• Plausible alternative explanations: The participants were randomly assigned, so the groups are not systematically different on any confounding variable. The only systematic difference between the groups was sleep deprivation. Thus, there are no plausible alternative explanations for the difference between the groups, other than sleep deprivation

Thus, the internal validity evidence for this study is high, and we can make a causal claim. For the participants in this study, we can say that sleep deprivation caused a decrease in performance.

##### Key points: Causation and internal validity

To make a cause-and-effect inference, you need to consider three criteria:

• Association of the Cause and Effect: There needs to be a association between the cause and effect. This can be established by a hypothesis test.
• Timing: The cause needs to happen BEFORE the effect.
• No Plausible Alternative Explanations: ALL other possible explanations for the effect need to be ruled out.

Random assignment removes any systematic differences between the groups (other than the treatment), and thus helps to rule out plausible alternative explanations.

##### Vocab
• Internal validity describes the degree to which cause-and-effect inferences are accurate and meaningful.

• Confounding variables are variables that might affect the results, other than the causal variable that we are interested in.

• Probabilistic equivalence means that there is not a systematic difference between groups. The groups are the same on average.