6.5 \(p\)-value of a test
Example 6.15 In Example 6.14:
Will the data support the decision of the foreman at the significance level \(\alpha=0.05\)?
In this case, the critical value of the normal is \(z_{0.05}\approx 1.64\) and the observed value of the statistic was \(Z=5/3\approx 1.67>1.64.\) Therefore, at the level of significance \(\alpha=0.05\) it is concluded that there is evidence in favor of the foreman decision.
What would be the lowest significance level for which the data will support the foreman decision and, therefore, \(H_0: p=0.10\) would be rejected?
The lowest significance level for which we would reject \(H_0\) is
\[\begin{align*} \mathbb{P}(Z>5/3|p=0.10)\approx0.0485. \end{align*}\]
This probability is precisely the level \(\alpha\) from which the decision of the test flips. It is the so-called \(p\)-value of the test.
The following block advances the key operative relation to use the \(p\)-value of a test to emit a decision.
Given a test for \(H_0,\) the rejection decision of \(H_0\) at significance level \(\alpha\) depends on the \(p\)-value of the test:
\[\begin{align*} \begin{cases} p\text{-value}<\alpha \iff \text{Reject $H_0$ at level $\alpha;$}\\ p\text{-value}\geq\alpha \iff \text{Do not reject $H_0$ at level $\alpha.$} \end{cases} \end{align*}\]
Definition 6.5 (\(p\)-value) The \(p\)-value of a hypothesis test is defined as the lowest significance level \(\alpha\) for which the test rejects the null hypothesis \(H_0.\)
Remark. The \(p\)-value can be informally regarded as a “measure of the degree of compatibility of \(H_0\) with the data”. A valid interpretation in terms of probability is a restatement of that in Definition 6.5: “the probability of obtaining a test statistic at least as unfavorable to \(H_0\) as the observed one, under \(H_0.\)”
Remark. The following are erroneous interpretations of the \(p\)-value: (1) “the probability of \(H_0\) being true given the data”; (2) “the probability of \(H_0\) vs. \(H_1\)”; (3) “the probability of the data given \(H_0\)”.
Depending on the kind of hypothesis to test, the \(p\)-value is computed in a different way. We differentiate three cases:
One-sided right tests; \(H_0: \theta=\theta_0\) vs. \(H_1:\theta>\theta_0.\) Assume that the test statistic is \(T(X_1,\ldots,X_n)\) and the critical region is \[\begin{align*} C=\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:T(x_1,\ldots,x_n)>k\}. \end{align*}\] If the observed value of the test statistic is \(T=t,\) then the \(p\)-value is \[\begin{align*} \text{$p$-value}:=\mathbb{P}(T\geq t|\theta=\theta_0). \end{align*}\]
One-sided left tests; \(H_0: \theta=\theta_0\) vs. \(H_1:\theta<\theta_0.\) In this case the critical region is \[\begin{align*} C=\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:T(x_1,\ldots,x_n)<k\} \end{align*}\] and, therefore, the \(p\)-value is \[\begin{align*} \text{$p$-value}:=\mathbb{P}(T\leq t|\theta=\theta_0). \end{align*}\]
Two-sided tests; \(H_0: \theta=\theta_0\) vs. \(H_1:\theta\neq\theta_0.\) In this case, the critical region is of the form \[\begin{align*} C=\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:T(x_1,\ldots,x_n)<k_1\ \text{or}\ T(x_1,\ldots,x_n)>k_2\}. \end{align*}\] The \(p\)-value is given by \[\begin{align*} \text{$p$-value}:=2\min\left\{\mathbb{P}(T\leq t|\theta=\theta_0),\mathbb{P}(T\geq t|\theta=\theta_0)\right\}. \end{align*}\] Observe that if the distribution of the test statistic \(T\) is symmetric about \(0,\)79 such as the normal or Student’s distribution, then \(\mathbb{P}(T\leq t|\theta=\theta_0) = \mathbb{P}(-T \leq t | \theta=\theta_0) = \mathbb{P}(T \geq -t | \theta=\theta_0)\) and the above minimum is not required: \[\begin{align*} \text{$p$-value}=2\mathbb{P}(T\leq -|t||\theta=\theta_0)=2\mathbb{P}(T\geq |t||\theta=\theta_0). \end{align*}\]
From the above definitions of the \(p\)-value, it is clear that the \(p\)-value is a function of the observed value of the test statistic \(T=t.\) Therefore, the \(p\)-value is a rv. In addition, the \(p\)-value is uniformly distributed in \([0,1]\) under \(H_0\) (see Exercise 6.16). It is also not difficult to see that \(p\text{-value}<\alpha\) if and only if the observed test statistic \(T=t\) belongs to \(C.\)
Example 6.16 Assume that in Example 6.3 it has been observed that \(Y=3\) of the \(n=15\) sampled voters support the candidate. Would that result indicate that the candidate is going to lose the elections (reject \(H_0:p=0.5\)) at significance level \(\alpha=0.05\)?
The hypothesis to test is
\[\begin{align*} H_0:p=0.5\quad \text{vs.}\quad H_1:p<0.5. \end{align*}\]
Since under \(H_0:p=0.5,\) \(Y\sim \mathrm{Bin}(n,0.5),\) then the \(p\)-value is given by
\[\begin{align*} \text{$p$-value} &=\mathbb{P}(Y\leq 3|p=0.5)=\sum_{y=0}^3 \binom{15}{y}(0.5)^{15} \\ &\approx0.018<\alpha=0.05. \end{align*}\]
Equivalently, it can be computed as:
Therefore, \(H_0: p=0.5\) is rejected in favor of \(H_1:p<0.5;\) that is, this result indicates that the candidate will not win the elections with a significance level of \(\alpha=0.05.\)
Example 6.17 It is estimated that a particular flight is profitable if the average occupation rate during a year is at least \(60\%.\) An airline is interested in determining whether it is profitable to keep a particular flight operative. For that, they record the occupation rates of \(120\) random flights scattered around the year, resulting a mean occupation rate of \(58\%\) and a quasistandard deviation of \(11\%.\) Considering that the occupation rates (in proportion) have an approximate normal distribution, is there enough evidence to cancel the flight because it is not profitable? Employ a significance level of \(\alpha=0.10.\)
Let \(\mu\) be the average occupation rate of the flight in one year. It is desired to test
\[\begin{align*} H_0:\mu=0.6\quad \text{vs.}\quad H_1:\mu<0.6. \end{align*}\]
The test statistic is
\[\begin{align*} T=\frac{\bar{X}-0.6}{S'/\sqrt{n}}=\frac{0.58-0.6}{0.11/\sqrt{120}}\approx-1.992. \end{align*}\]
Under \(H_0:\mu=0.6,\) the statistic is distributed as \(t_{119}\):
\[\begin{align*} \mathbb{P}(T\leq -1.992|\mu=0.6)\approx0.0239<\alpha=0.10. \end{align*}\]
The last probability can be computed as
Therefore, \(H_0:\mu=0.6\) is rejected, that is, the sample indicates that the flight is not profitable.
In this case, the distributions of \(T\) and \(-T\) are equal!↩︎