2.15 Drawing conclusions and “statistical significance”

We have seen that statistical hypothesis testing is a process of comparing the real-world observed result to a null hypothesis where there is no effect. At the end of the process, we compare the observed result to the distribution of simulated results if the null hypothesis were true, and from this we determine whether the observed result is compatible with the null hypothesis.

The conclusions that we can draw form a hypothesis test are based on the comparison between the observed result and the null hypothesis. For example, in the Monday breakups study, we concluded:

The observed result is not compatible with the null hypothesis. This suggests that breakups may be more likely to be reported on Monday.

There are two important point to notice in how this conclusion is written:

The conclusion is stated in terms of compatibility with the null hypothesis.
The conclusion uses soft language like “suggests.” This is becuase we did not prove that breakups are more likely to be reported on Monday. Instead, we simply have strong evidence against the null hypothesis (that breakups are equally likely each day). This, in turn, suggests that breakups are more likely to be reported on Mondays.

Similarly, if the observed result had been within the range of likely results if the null hypothesis were true, we would still write the conclusion in terms of compatibility with the null hypothesis:

The observed result is compatible with the null hypothesis. We do not have sufficient evidence to suggest that breakups are more likely to be reported on Monday.

In both cases, notice that the conclusion is limited to whether there is an effect or not. There are many additional aspects that we might be interested in, but the hypothesis test does not tell us about. For example,

We don’t know what caused the effect.
We don’t know the size of the effect. Perhaps the true percentage of Monday breakups is 26%. Perhaps it is slightly more or slightly less. We only have evidence that the results are incompatible with the null hypothesis.
We don’t know the scope of the effect. Perhaps the phenomenon is limited to this particular year, or to breakups that are reported on facebook, etc.

(We will learn about size, scope, and causation later in the course. The key point to understand now is that a hypothesis test, by itself, can not tell us about these things and so the conclusion should not address them.)

2.15.1 Statistical significance

In news reports and scientific literature, we often hear the term, “statistical significance.” What does it mean for a result to be “statistically significant?” In short, it means that the observed result is not compatible with the null hypothesis.

Different scientific communities have different standards for determining whether a result is statistically significant. In the social sciences, there are two common approaches for determining statistical significance.

Use the range of likely results: The first approach is to determine whether the observed result is within the range of likely results if the null hypothesis were true. If the observed result is outside the range of likely values if the null hypothesis were true, then social scientists consider A second common practice is to use $p$ -values. Commonly, social scientists consider that to be sufficient evidence that the observed result is not compatible with the null hypothesis, and thus that the observed result is statistically significant.
Use p < 0.05: A second common approach is to use a $p$ -value of 0.05 as a threshold. If $p<0.05$ , social scientists consider that to be sufficient evidence that the observed result is not compatible with the null hypothesis, and thus that the observed result is statistically significant.

Other scientific communities may have different standards. Moreover, there is currently a lot of discussion about whether the current thresholds should be reconsidered, and even whether we should even have a threshold. Some scholars advocate that researchers should just report the $p$ -value and make an argument as to whether it provides sufficient evidence against the null model.

For our class, you can use either the “range of likely values” approach, the “ $p<0.05$ ” approach, or the “report the p-value and make an argument” approach to determining whether an observed result is statistically significant. As you become a member of a scientific community, you will learn which approaches that community uses.

2.15.2 Statistical significance vs. practical significance

Don’t confuse statistical significance with practical significance. Often, statistical significance is taken to be a indication of whether the result is meaningful in the real world (i.e., “practically significant”). But statistical significance has nothing to do with real-world importance. Remember, statistical significance just tells us whether the observed result is compatible with the null hypothesis. The question of whether the result is of real-world (or practical) significance cannot be determined statistically. Instead, this is something that people have to make an argument about.

2.15.3 Other things that statistical significance can’t tell us.

Again, statistical significance only tells us that an observed result is not compatible with the null hypothesis. It does not tell us about other important aspects, including:

Statistical significance does not mean that we have proven something. It only tells us that the there is evidence against a null model, which in turn would suggest that the effect is real.
Statistical significance says nothing about what caused the effect
Statistical significance does not tell us the scope of the effect (that is, how broadly the result apply).

2.15.4 Examples

Here is how to write a conclusion to a hypothesis test.

If the result is statistically significant:

The observed result is not compatible with the null hypothesis. This suggests that there may be an effect.

If the result is not statistically significant:

The observed result is compatible with the null hypothesis. We do not have sufficient evidence to suggest that there is an effect.

2.15.5 Summary

The box below summarizes the key points about drawing conclusions and statistical significance. statistical hypothesis testing.

Key points about drawing conclusions and statistical significance

Conclusions from a hypothesis test are stated in terms of compatibility with the null hypothesis
We do not prove anything, so conclusions should use softer language like suggests
Statistical significance simply means that the observed result is not compatible with the null hypothesis
- Statistical significance says nothing about what caused the effect
- Statistical significance does not tell us the scope of the effect (that is, how broadly the result apply).
- Statistical significance does not tell us the size of the effect, or whether it is large enough to have real-world importance.