C Informal review on hypothesis testing

The process of hypothesis testing has an interesting analogy with a trial. The analogy helps to understand the elements present in a formal hypothesis test in an intuitive way.²⁹⁷

Hypothesis test	Trial
Null hypothesis \(H_0\)	The defendant: an individual accused of committing a crime. He²⁹⁸ is backed up by the presumption of innocence, which means that he is not guilty until there is enough evidence to support his guilt.
Sample \(X_1,\ldots,X_n\)	Collection of evidence supporting innocence and guilt of the defendant. This evidence contains a certain degree of uncontrollable randomness due to how it is collected and the context regarding the case.²⁹⁹
Test statistic³⁰⁰ \(T_n\)	Summary of the evidence presented by the prosecutor and defense lawyer.
Distribution of \(T_n\) under \(H_0\)	The judge conducting the trial. He evaluates and measures the evidence presented by both sides and presents a verdict for the defendant.
Significance level \(\alpha\)	\(1-\alpha\) is the strength of the evidence required by the judge for condemning the defendant. The judge allows evidence that, on average, condemns \(100\alpha\%\) of the innocents, due to the randomness inherent to the evidence collection process. The level \(\alpha=0.05\) is considered to be reasonable.³⁰¹
\(p\)-value	Decision of the judge that measures the degree of compatibility, in a scale \(0\)–\(1,\) of the presumption of innocence with the summary of the evidence presented. If \(p\text{-value}<\alpha,\) the defendant is declared guilty. Otherwise, he is declared not guilty.
\(H_0\) is rejected	The defendant is declared guilty: there is strong evidence supporting his guilt.
\(H_0\) is not rejected	The defendant is declared not guilty: either he is innocent or there is not enough evidence supporting his guilt.

More formally, the \(p\)-value of a hypothesis test about \(H_0\) is defined as:

The \(p\)-value is the probability of obtaining a test statistic more unfavorable to \(H_0\) than the observed, assuming that \(H_0\) is true.

Therefore, if the \(p\)-value is small (smaller than the chosen level \(\alpha\)), it is unlikely that the evidence against \(H_0\) is due to randomness. As a consequence, \(H_0\) is rejected. If the \(p\)-value is large (larger than \(\alpha\)), then it is more possible that the evidence against \(H_0\) is merely due to the randomness of the data. In this case, we do not reject \(H_0.\)

If \(H_0\) holds, then the \(p\)-value (which is a random variable) is distributed uniformly in \((0,1).\) If \(H_0\) does not hold, then the distribution of the \(p\)-value is not uniform but concentrated at \(0\) (where the rejections of \(H_0\) take place).

References

Molina Peralta, I., and E. García-Portugués. 2025. A First Course on Statistical Inference. https://bookdown.org/egarpor/inference/.

That is not intended to replace a formal introduction to hypothesis tests. The interested reader can find one, e.g., in Chapter 6 in Molina Peralta and García-Portugués (2025).↩︎
The masculine pronoun in no case indicates gender ascription. It is used as a neutral form and could be substituted for any personal pronoun.↩︎
Think about phenomena that may randomly support defendant’s innoncence or guilt, irrespective of his true condition. For example: spurious coincidences (“happen to be in the wrong place at the wrong time”), lost of evidence during the case, previous past statemets of the defendant, doubtful identification by witness, imprecise witness testimonies, unverifiable alibi, etc.↩︎
Usually simply referred to as statistic.↩︎
As the judge must have the power of condemning a guilty defendant. Setting \(\alpha=0\) (no innocents are declared guilty) would result in a judge that systematically declares everybody not guilty. Therefore, a compromise is needed.↩︎