Chapter 6 Hypothesis tests

6.1 Introduction

The following example serves to illustrate the main philosophy on hypothesis testing.

Example 6.1 A pharmaceutical company suspects that a given drug in testing has produces an increment of the ocular tension as a secondary effect. The basis tension level has mean \(15\) and the suspected increment is to a mean of \(18\) units. This increment may cause a raise on the risk of suffering glaucoma. Since the drug has a positive primary effect, it is important to check if this suspect is true before moving forward on the commercialization of the drug.

The medical trials have established that, at a population level, the ocular tension tends to follow a distribution \(\mathcal{N}(15,1).\) The suspect about the secondary effect then means that the ocular tension among the takers of the drug is \(\mathcal{N}(18,1).\) We denote by \(X\sim\mathcal{N}(\mu,1)\) to the r.v. “ocular tension of the takers of the drug”. Then, the question the pharmaceutical company faces is to decide whether \(\mu=15\) (the drug has no secondary effect) or \(\mu=18\) (the drug has a secondary effect) based on empirical evidence. To formalize this, we define:

\[ \left\{ \begin{array}{ll} H_0:\mu=15 & \text{(null hypothesis)} \\ H_1:\mu=18 & \text{(alternative hypothesis)} \end{array} \right. \]

Then, we want to find evidence in favour of \(H_0\) or of \(H_1.\) But, since the drug has been proved effective, the pharmaceutical company is only willing to stop its commercialization if there is enough evidence against \(H_0\) (or in favor of \(H_1\)), that is, if there is enough evidence pointing to the presence of a secondary effect.

In order to look for evidence in favour of \(H_0\) or of \(H_1,\) a sample of the ocular tension level of four drug takers is measured, and its sample mean \(\bar{X}\) is computed. Then, the following is verified:

  • If \(H_0\) is true, then \(\bar{X}\sim \mathcal{N}(15,1/4).\)
  • If \(H_1\) is true, then \(\bar{X}\sim \mathcal{N}(18,1/4).\)

Then, if given the sample we obtain a small value of \(\bar{X},\) we will have more evidence in favour of \(H_0,\) and if we obtain a large value of \(\bar{X},\) then the sample supports \(H_1.\) But, up to which value of \(\bar{X}=k_1\) we admit that \(H_0\) is true?

This question can be answered in the following way: we can limit the possibility of incorrectly stopping the commercialization of the drug to a small value \(\alpha,\) that is, \[ \mathbb{P}(\text{Reject}\ H_0|H_0 \ \text{true})\leq \alpha, \] and choose the constant \(k_1\) that statisfies this, that is, choose the smaller \(k_1\) that verifies \[ \mathbb{P}(\text{Reject}\ H_0|H_0 \ \text{true})=\mathbb{P}(\bar{X}>k_1|\mu=15)\leq \alpha. \]

Standardizing \(\bar{X},\) we obtain the standard normal distribution, and \(k_1\) follows from the quantile of a partir of the percentil of dicha distribution, \[ \mathbb{P}(\bar{X}> k_1|\mu=15)=\mathbb{P}(2(\bar{X}-15)> 2(k_1-15))=\mathbb{P}(Z> 2(k_1-15))\leq \alpha. \]

From here, we have that \(2(k_1-15)=z_{\alpha},\) so \(k_1=15+z_{\alpha}/2.\) For example, if we take \(\alpha=0.05,\) then \(z_{\alpha}=1.645\) and \(k_1=15+1.645/2.\) Therefore, if we obtain \(\bar{X}>15+1.645/2,\) then if \(H_0\) was true, the obtained sample would belong to this set with probability \(0.05.\) This implies that one of the following two possibilities is happening:

  1. either \(H_0\) is true but the obtained sample was “extreme” and is not very representative of its distribution;
  2. or \(H_0\) is not true.

If we decrease progressively \(\alpha,\) we will attain a value \(k_1\) for which we are not disposed to accept that such an extreme event has happened.

We could make the same reasoning that we have made, now with respect to \(H_1.\) For small values of \(\bar{X},\) we would think that \(H_1\) is not true. But, up to which value \(\bar{X}=k_2\) we would accept \(H_1\)?

If we fix again a bound for the probability of commiting an error (in this case, allowing the commercialization of the drug while it has secondary effects), \(\beta,\) that is, \[ \mathbb{P}(\text{Reject}\ H_1| H_1\ \text{true})\leq \beta, \] then we will choose the larger constant \(k_2\) that verifies that relation, that is, that verifies \[ \mathbb{P}(\text{Reject}\ H_1| H_1\ \text{true})=\mathbb{P}(\bar{X}\leq k_2|\mu=18)\leq \beta. \]

Standardizing \(\bar{X}\) in the previous probability, we obtain \[\begin{align*} \mathbb{P}(\bar{X}\leq k_2|\mu=18)&=\mathbb{P}(2(\bar{X}-18)\leq 2(k_2-18)|\mu=18)\\ &=\mathbb{P}(Z\leq 2(k_2-18))\leq \beta, \end{align*}\] in such a way that \(2(k_2-18)=-z_{\beta},\) so \(k_2=18-z_{\beta}/2.\) Taking \(\beta=0.05,\) we have \(z_{\beta}=1.645,\) and \(k_2=18-1.645/2.\)

Then, following this argument, the decision would be:

  • If \(\bar{X}\leq 15+1.645/2,\) then we accept \(H_0.\)
  • If \(\bar{X}\geq 18-1.645/2,\) then we accept \(H_1.\)

The following question arises inmediately: what shall we do if \(15+1.645/2<\bar{X}<18-1.645/2\)? Also, imagine that instead of \(15\) units, the basis level of ocular tension ocular was \(16.5\) units. Then \(16.5+1.645/2=17.322>18-1.645/2=17.177\) and we will be accepting \(H_0\) and \(H_1\) at the same time! These inconvenients point out towards focusing on choosing just a single value \(k\) from which to make a decision. But in this case, only one of the probabilities for the two types of error, \(\alpha\) and \(\beta,\) can be controlled.

If we decrease \(\alpha\) too much, \(\beta\) will increase. In addition, it may happen that \(\bar{X}>k,\) so we would not have evidence against \(H_0,\) but that the sample is neither representative of \(H_1.\) It may also happen that \(\bar{X}\leq k,\) so we will not have evidence against \(H_0,\) but that however the sample is more representative of \(H_1\) than of \(H_0.\) Therefore, if we want to control \(\alpha,\) in the first place we have to fix the null hypothesis \(H_0\) as the most conservative statement, that is, the statement that will be assumed as true unless there is enough evidence against it. As a consequence, the decision to take is going to be one the following:

  • “Reject \(H_0\)” if \(\bar{X}>k\) (without commitment to accept \(H_1\));
  • “Do not reject \(H_0\)” if \(\bar{X}\leq k\) (without commitment to reject \(H_1,\) which could be valid).

In general, through this section we assume a r.v. \(X\) with distribution within the family of distributions \(\{F(\,\cdot\,;\theta)\,:\,\theta\in\Theta\}\) for whom we want to determine validity of a statement \(H_0\) about the parameter \(\theta\) against an alternative statement \(H_1.\) Splitting the parametric space as \(\Theta=\Theta_0\cup\Theta_1\cup\Theta_2,\) where typically4 \(\Theta_1=\Theta_0^c\) and \(\Theta_2=\emptyset,\) the hypotheses to test are of the form \[ H_0:\theta\in\Theta_0 \quad \text{vs.}\quad H_1:\theta\in\Theta_1. \] Recall that an statement about the unknown parameter \(\theta\) is equivalent to an statement about the distribution \(F(\,\cdot\,;\theta).\)

Definition 6.1 (Null and alternative hypotheses) The null hypothesis (denoted by \(H_0\)) is the statement that is assumed true unless there is enough evidence against. The opposite statement is the alternative hypothesis (and is denoted by \(H_1\)).

Definition 6.2 (Simple and composite hypotheses) If the set \(\Theta_j\subset \Theta\) that determines the hypothesis \(H_j\) possesses a single element then, under (the validity of) such hypothesis \(H_j,\) the distribution of \(X\) is completely known, and in this case \(H_j\) is said to be a simple hypothesis. Otherwise, \(H_j\) is referred as a composite hypothesis.

The decision in favor or against \(H_0\) is made from the information available in the realization of a s.r.s. \((X_1,\ldots,X_n)\) of \(X.\)

Definition 6.3 (Test) A test or an hypothesis test (not randomized) of \(H_0\) vs. \(H_1\) is a function \(\varphi:\mathbb{R}^n\rightarrow\{0,1\},\) where \(1\) stands for “reject \(H_0\)” and \(0\) for “do not reject \(H_0\)”, of the form \[ \varphi(x_1,\ldots,x_n)=\left\{\begin{array}{ll} 1 & \text{if}\ (x_1,\ldots,x_n)\in C,\\ 0 & \text{if}\ (x_1,\ldots,x_n)\in C^c, \end{array}\right. \] where \(C\) and \(C^c\) provide a partitio of the sample space \(\mathbb{R}^n.\) The set \(C\) is denoted as the critical region or rejection region and \(C^c\) is the acceptance region.

Then, an hypothesis test is entirely determined by the critical region \(C\) so, in principle, there exist infinitely many tests for testing an hypothesis at hand. The selection of a particular test is done according to the test reliability, that is, according to its “success rate”. The possible consequences (with respect to the reality about \(H_0,\) which is unknown or otherwise we would not need an hypothesis test) of a test decision are given in the following table:

Test decision \(\backslash\) Reality \(H_0\) true \(H_0\) false
Reject \(H_0\) Type I error Correct decision
Do not reject \(H_0\) Correct decision Type II error

Then, the reliability of an hypothesis test is quantified and assessed in terms of the two possible types of errors:

Error Interpretation
Type I error Reject \(H_0\) if is true
Type II error Do not reject \(H_0\) if is false

The classical procedure for selecting a test among all the available ones is the following:

  1. Fix a bound, \(\alpha,\) for the probability of making the type I error. This bound is the significance level of the hypothesis test.
  2. Exclude all the tests with critical region \(C\) that does not respect the bound for the type I error, that is, that do not satisfy the condition \[ \mathbb{P}(\text{Reject}\ H_0|H_0\ \text{true}) =\mathbb{P}((X_1,\ldots,X_n)\in C|H_0 \ \text{true})\leq \alpha. \]
  3. Among the selected tests, choose the one that has a critical region \(C\) that minimizes the type II error, that is, the minimizes \[ \mathbb{P}(\text{Do not reject}\ H_0|H_1\ \text{true}) =\mathbb{P}((X_1,\ldots,X_n)\in C^c|H_1 \ \text{true}). \]

Instead of determining the critical region \(C\) directly as a subset of \(\mathbb{R}^n,\) it is simpler to compute it as a function of the sample (an statistic) and express the critical region as a subset of the range of an statistic. Then, a test statistic will determine the rejection region of the test.

Definition 6.4 (Test statistic) A test statistic of an hypothesis \(H_0\) versus \(H_1\) is a measurable function of the sample that under \(H_0\) has a completely known distribution.

A test statistic can be obtained usually by taking an estimator of the unknown parameter that is involved in \(H_0\) and transforming it in such a way that possesses a known distribution under \(H_0.\)

Summarizing what has been exposed until now, the key elements of an hypothesis test are:

  • A null hypothesis \(H_0\) and an alternative \(H_1;\)
  • A significance level \(\alpha;\)
  • A test statistic;
  • A critical region \(C.\)

Example 6.2 A political poll is made in order to known the voting intentions of the electorate regarding two candidates, A and B, and, specifically, if candidate A will win the elections. For that purpose, the number of voters \(Y\) who will vote for candidate A within a sample of \(n=15\) voters was recorded. The associated hypothesis test to this problem is \[ H_0:p=0.5\quad \text{vs.}\quad H_1:p<0.5, \] where \(p\) denotes the proportion of voters in favour of \(A.\) If \(Y\) is the test statistic and the rejection region is set as \(C=\{y\leq 2\},\) compute the probability of the type I error for the test, \(\alpha.\)

The probability of the type I error is \[ \alpha=\mathbb{P}(\text{Type error I})=\mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})=\mathbb{P}(Y\leq 2|p=0.5). \] Since \(Y\sim \mathrm{Bin}(n,0.5),\) the previous probability is \[ \alpha=\sum_{y=0}^2 \left(\begin{array}{c}15 \\ y\end{array}\right)(0.5)^{15}=0.004. \]

Example 6.3 Assume that the real proportion of voters for the candidate A is \(p=0.3\) (so \(H_0\) is false). What is the probability that in the test of Example 6.2 we obtain that candidate A will win (\(H_0\) is not rejected)?

The probability of type II error is \[ \beta=\mathbb{P}(\text{Type error II})=\mathbb{P}(\text{Do not reject $H_0$}|\text{$H_0$ false}). \] In this case, we want to compute the value of \(\beta\) for \(p=0.3,\) that is, \[\begin{align*} \beta(0.3)&=\mathbb{P}(Y> 2|p=0.3)=1-\sum_{y=0}^2 \left(\begin{array}{c}15 \\ y\end{array}\right)(0.3)^y(0.7)^{15-y}\\ &=1-0.127=0.873. \end{align*}\] Then, if we employ that rejection region, the test would most likely conclude that candidate A will win, either the candidate is indeed going to win or not at all. So the test is not very useful.

6.2 Tests on a normal population

We assume in this section that the population r.v. \(X\) has distribution \(\mathcal{N}(\mu,\sigma^2),\) where both \(\mu\) and \(\sigma^2\) are unknown. We will test hypotheses about one of the two parameters from a s.r.s. \((X_1,\ldots,X_n)\) of \(X.\)

6.2.1 Tests about \(\mu\)

One-sided right test

We want to test the hypothesis

\[ H_0:\mu=\mu_0 \quad \text{vs.}\quad H_1:\mu> \mu_0. \]

As seen in the previous section, we must fix first the bound \(\alpha\) for the probability of type I error, that is,

\[\begin{align} \mathbb{P}(\text{Reject}\ H_0|H_0 \ \text{true})\leq \alpha.\tag{6.1} \end{align}\]

To obtain the critical region, we need an statistic \(T(X_1,\ldots,X_n).\) If we take \(\bar{X},\) whose distribution is \(\mathcal{N}(\mu_0,\sigma^2/n),\) then we should reject the null hypothesis for large values of \(\bar{X}.\) Therefore, the critical has the form

\[ \{(x_1,\ldots,x_n)\in \mathbb{R}^n: \bar{X}>k\}, \] for a give value \(k.\) We can compute the constant \(k\) in such a way that verifies (6.1), that is, such that

\[\begin{align} \mathbb{P}(\bar{X}>k|\mu=\mu_0)\leq \alpha.\tag{6.2} \end{align}\]

But there are infinitely many values \(k\) that satisfy the above relation. For example, if we consider a value \(k_1\) that is not the smallest one, then there will be another value \(k_2<k_1\) such that

\[ \mathbb{P}(\bar{X}>k_1|\mu=\mu_0)\leq \mathbb{P}(\bar{X}>k_2|\mu=\mu_0)\leq \alpha. \] Then, it will happen that the probability of type II error of the test with critical region \(\{\bar{X}>k_1\}\) will be larger than the one for the test with critical region \(\{\bar{X}>k_1\}.\) Therefore, among the tests with critical region of the type \(\{\bar{X}>k\}\) that verify (6.2), the most efficient is the one with smallest \(k.\)

However, recall that in this cases is not possible to determine the smallest \(k\) that verifies (6.2) since the distribution of \(\bar{X}\) is partially unknown (\(\sigma^2\) is unknown). But if we estimate \(\sigma^2\) with \(S'^2,\) then a test statistic is

\[ T(X_1,\ldots,X_n)=\frac{\bar{X}-\mu_0}{S'/\sqrt{n}}\stackrel{H_0}{\sim} t_{n-1}. \] Determining the critical region from \(T\) is simple, since the range of \(T\) is \(\mathbb{R}\) and therefore the critical region is an interval of \(\mathbb{R}.\) The null hypothesis is be rejected for large values of \(T,\) hence the critical region is of the form \(\{T(x_1,\ldots,x_n)>k\}.\) We must select the smallest \(k\) such that

\[ \mathbb{P}(T>k|\mu=\mu_0)\leq \alpha. \]

From here, it is deduced that \(k=t_{n-1;\alpha}.\) Therefore, the critical region is

\[\begin{align*} C &=\{(x_1,\ldots,x_n)\in\mathbb{R}^n: T(x_1,\ldots,x_n)>t_{n-1;\alpha}\} \\ &=\left\{(x_1,\ldots,x_n)\in\mathbb{R}^n: \frac{\bar{X}-\mu_0}{s'/\sqrt{n}}>t_{n-1;\alpha} \right\} \\ &=\left\{(x_1,\ldots,x_n)\in\mathbb{R}^n: \bar{X}>\mu_0+t_{n-1;\alpha}\frac{s'}{\sqrt{n}}\right\}. \end{align*}\]

One-sided left test

If the null hypothesis to test is \[ H_0:\mu=\mu_0 \quad \text{vs.}\quad H_1:\mu\leq \mu_0, \] then the test statistic is exactly the same as before, but in this case we must reject the null hypothesis for small values of \(T.\) Therefore, the critical region has the form \(\{T<k\}.\) Fixing a significance level \(\alpha\) and selecting the larger \(k\) that verifies the relation \[ \mathbb{P}(T<k|\mu=\mu_0)\leq \alpha \] we get that the value of \(k\) is \(k=-t_{n-1;\alpha}\) and the critical region is \[ C=\{(x_1,\ldots,x_n)\in\mathbb{R}^n : T(x_1,\ldots,x_n)<-t_{n-1;\alpha}\}. \]

Two-sided test

Now we want to test the hypothesis

\[ H_0:\mu=\mu_0 \quad \text{vs.}\quad H_1:\mu\neq \mu_0. \] For the same test statistic, now we will reject for large absolute values of \(T\) that will indicate deviation from \(H_0.\) That is, the critical region has the form \[ \{(x_1,\ldots,x_n)\in\mathbb{R}^n: T(x_1,\ldots,x_n)\in(-\infty,k_1)\cup (k_2,\infty)\}. \] Now we must determine the value of the two constants \(k_1\) and \(k_2\) in such a way that \[ \mathbb{P}(T\in(-\infty,k_1)\cup (k_2,\infty)|\mu=\mu_0)\leq \alpha. \] Evenly splitting \(\alpha/2\) to both tails of the distribution of \(T\) since the distribution of \(T\) under \(H_0\) is \(t_{n-1}\) and is symmetric, then \(k_2=t_{n-1;\alpha/2}\) and \(k_1=-t_{n-1;\alpha/2}.\) Then, the critical region is \[ C=\{(x_1,\ldots,x_n)\in\mathbb{R}^n : |T(x_1,\ldots,x_n)|>t_{n-1;\alpha/2}\}. \]

Example 6.4 Eight bullets made of a new type of gunpowder were fired in a gun, and their initial speeds were measured. The sample mean and the standard deviation of those speeds were \(\bar{X}=2959\) and \(S'=39\) (ft/s). The producer of the gunpowder claims that the new gunpowder delivers an average initial speed above \(3000\) ft/s. Is the sample providing evidences against such claim, at the significance level \(\alpha=0.025\)? Assume that the initial speeds follow a normal distribution.

We want to test the hypothesis \[ H_0:\mu=3000\quad \text{vs.}\quad H_1:\mu<3000. \] The critical region is \[ C=\{T<-t_{7;0.025}=-2.365\}. \] The observed value of the statistic is \[ T=\frac{\bar{x}-\mu_0}{s'/\sqrt{n}}=\frac{2959-3000}{39.1/\sqrt{8}}=-2.966<-2.365. \] Then, the observed initial speeds are sufficiently low to question the claim of the gunpowder producer.

Recall that \(t_{7;0.025}\) can be computed as

qt(0.025, df = 7, lower.tail = FALSE)
## [1] 2.364624

6.2.2 Tests about \(\sigma^2\)

We study in this section the tests for the following hypotheses:

  1. \(H_0:\sigma^2=\sigma_0^2\) vs. \(H_1:\sigma^2>\sigma_0^2;\)
  2. \(H_0:\sigma^2=\sigma_0^2\) vs. \(H_1:\sigma^2<\sigma_0^2;\)
  3. \(H_0:\sigma^2=\sigma_0^2\) vs. \(H_1:\sigma^2\neq\sigma_0^2.\)

An estimator of \(\sigma^2\) is the sample quasivariance, \(S'^2,\) for whom we perfectly know its distribution under the null hypothesis of the statistic, \[ U=\frac{(n-1)S'^2}{\sigma_0^2}\stackrel{H_0}\sim \chi_{n-1}^2. \] Therefore, \(U\) is a test statistic for a, b, and c.

When testing a, if the null hypothesis is not true, \(U\) will tend to have large values. Therefore, the critical region is of the form \(\{U>k\}.\) For obtaining the best value \(k,\) we select the significance level \(\alpha\) and take the smallest \(k\) such that \[ \mathbb{P}(U>k|\sigma^2=\sigma_0^2)\leq \alpha. \] Then, the best choice is \(k=\chi_{n-1;\alpha}^2,\) so the critical region is \[ C_a=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:U(x_1,\ldots,x_n)>\chi_{n-1;\alpha}^2\}. \] With an analogous reasoning, the critical region for b follows: \[ C_b=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:U(x_1,\ldots,x_n)<\chi_{n-1;1-\alpha}^2\}. \]

The critical region for c arises from splitting evenly the probability \(\alpha\) in the critical regions of a and b: \[ C_c=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:U(x_1,\ldots,x_n)>\chi_{n-1;\alpha/2}^2 \ \text{or}\ U(x_1,\ldots,x_n)<\chi_{n-1;1-\alpha/2}^2\} \]

Example 6.5 A company claims that the diameter of one of the parts of an engine it produces has a production variance not larger than \(0.0002\) squared inches. A s.r.s. of \(10\) pieces revealed a sample quasivariance \(0.0003.\) Assuming that the measurements of the diameter follow a normal distribution, test at a significance level \(\alpha=0.05\) \[ H_0:\sigma^2=0.0002\quad \text{vs.} \quad H_1:\sigma^2>0.0002. \]

The test statistic is \[ U=\frac{(n-1)S^2}{\sigma_0^2}=\frac{(9)(0.0003)}{0.0002}=13.5 \] and the rejection region is \[ C=\{U>\chi_{9,0.05}^2=16.919\}. \]

Since \(U=13.5<16.919,\) then the data does not provide any evidence against the variance of the diameter being larger than \(0.0002.\)

6.3 Tests on two normal populations

We assume now two populations represented as two independent r.v.’s \(X_1\sim\mathcal{N}(\mu_1,\sigma_1^2)\) and \(X_2\sim\mathcal{N}(\mu_2,\sigma_2^2),\) with unknown means and variances. We will test hypotheses about the difference of means \(\mu_1-\mu_2,\) assuming \(\sigma_1^2=\sigma_2^2,\) and about the ratio of variances \(\sigma_1^2/\sigma_2^2,\) from two s.r.s.’s \((X_{11},\ldots,X_{1n_1})\) and \((X_{21},\ldots,X_{2n_2})\) of \(X_1\) and \(X_2,\) respectively.

6.3.1 Equality of means

We assume that \(\sigma_1^2=\sigma_2^2=\sigma^2.\) The hypotheses to test are of three types:

  1. \(H_0:\mu_1=\mu_2\) vs. \(H_1:\mu_1>\mu_2;\)
  2. \(H_0:\mu_1=\mu_2\) vs. \(H_1:\mu_1<\mu_2;\)
  3. \(H_0:\mu_1=\mu_2\) vs. \(H_1:\mu_1\neq \mu_2.\)

Denoting \(\theta=\mu_1-\mu_2,\) then the hypotheses can be rewritten as:

  1. \(H_0:\theta=0\) vs. \(H_1:\theta>0;\)
  2. \(H_0:\theta=0\) vs. \(H_1:\theta<0;\)
  3. \(H_0:\theta=0\) vs. \(H_1:\theta\neq 0.\)

An estimator of \(\theta\) is the difference of sample means, \[ \hat\theta=\bar{X}_1-\bar{X}_2\sim \mathcal{N}\left(\mu_1-\mu_2,\sigma^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)\right). \] If we estimate \(\sigma^2\) by means of \[ S^2=\frac{(n_1-1)S_1'^2+(n_2-1)S_2'^2}{n_1+n_2-2}, \] then a test statistic is

\[ T=\frac{\bar{X}_1-\bar X_2}{S\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\stackrel{H_0}\sim t_{n_1+n_2-2}. \]

Then, the critical regions in each case are:

  1. \(C_a=\{T>t_{n_1+n_2-2;\alpha}\};\)
  2. \(C_b=\{T<-t_{n_1+n_2-2;\alpha}\};\)
  3. \(C_c=\{|T|>t_{n_1+n_2-2;\alpha/2}\}.\)

Example 6.6 Is there any evicence that any of the two training methods described in Example 5.5 works better? The average assembly times for the two groups of nine employees were \(\bar{X}_1=35.22\) and \(\bar{X}_2=31.56,\) and the quasivariances \(S_1'^2=24.445\) and \(S_2'^2=20.027.\)

We want to test \[ H_0:\mu_1=\mu_2\quad \text{vs.}\quad H_1:\mu_1\neq \mu_2. \]

The observed value of the test statistic follows from the pooled estimation of the variance, \[ S^2=\frac{(9-1)(24.445)+(9-1)(20.027)}{9+9-2}=22.24, \]

which provides

\[ T=\frac{35.22-31.56}{4.71\sqrt{\frac{1}{9}+\frac{1}{9}}}=1.65. \]

Then, the critical region is \(C=\{|T|>t_{16;0.025}=2.12\}.\) Since \(T=1.65<2.12,\) that is, the statistic does not belong to neither of the two parts of the critical region. It is concluded that the data does not provide evidence supporting that any of the two methods works better.

6.3.2 Equality of variances

We want to test the next hypotheses:

  1. \(H_0:\sigma_1^2=\sigma_2^2\) vs. \(H_1:\sigma_1^2>\sigma_2^2;\)
  2. \(H_0:\sigma_1^2=\sigma_2^2\) vs. \(H_1:\sigma_1^2<\sigma_2^2;\)
  3. \(H_0:\sigma_1^2=\sigma_2^2\) vs. \(H_1:\sigma_1^2\neq\sigma_2^2.\)

Denoting \(\theta=\sigma_1^2/\sigma_2^2,\) then the hypotheses can be rewritten as:

  1. \(H_0:\theta=1\) vs. \(H_1:\theta>1;\)
  2. \(H_0:\theta=1\) vs. \(H_1:\theta<1;\)
  3. \(H_0:\theta=1\) vs. \(H_1:\theta\neq 1.\)

An estimator of \(\theta\) is \(\hat\theta=S_1'^2/S_2'^2,\) but its distribution is unknown. But we do known the distribution of \[ F=\frac{\frac{(n_1-1)S_1'^2}{\sigma_1^2}/(n_1-1)}{\frac{(n_2-1)S_2'^2}{\sigma_2^2}/(n_2-1)} =\frac{S_1'^2 \sigma_2^2}{S_2'^2\sigma_1^2}\sim \mathcal{F}_{n_1-1,n_2-1}. \] Besides, under \(H_0:\sigma_1^2=\sigma_2^2,\) it is verified \[ F=\frac{S_1'^2}{S_2'^2}\sim \mathcal{F}_{n_1-1,n_2-1}, \] so \(F\) is a test statistic. The rejection regions are given by:

  1. \(C_a=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:F(x_1,\ldots,x_n)>\mathcal{F}_{n_1-1,n_2-1;\alpha}\};\)
  2. \(C_b=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:F(x_1,\ldots,x_n)<\mathcal{F}_{n_1-1,n_2-1;1-\alpha}\};\)
  3. \(C_c=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:F(x_1,\ldots,x_n)>\mathcal{F}_{n_1-1,n_2-1;\alpha/2} \ \text{or}\ \ F(x_1,\ldots,x_n)<\mathcal{F}_{n_1-1,n_2-1;1-\alpha/2}\}.\)

Example 6.7 An experiment for studying the pain threshold consists in applying small electric shocks to \(14\) men and \(12\) women and recording their pain thresholds. The experiment provides the following data:

Statistic Men Women
\(n\) \(14\) \(10\)
\(\bar{X}\) \(16.2\) \(14.9\)
\(S'^2\) \(12.7\) \(26.4\)

Assuming that the variable that measures the threshold pain for men and women is normally distributed, is there evidence of a different variability in the threshold pain between men and women?

We want to test

\[ H_0:\sigma_M^2=\sigma_W^2\quad \text{vs.}\quad H_1: \sigma_M^2\neq \sigma_W^2. \]

The test statistic is

\[ F=\frac{12.7}{26.4}=0.481. \]

The critical region is

\[ C=\{F>\mathcal{F}_{13,9;0.025}=3.31\ \text{or}\ F<1/\mathcal{F}_{9,13;0.025}=1/3.87=0.26\}. \]

Since \(F=0.481\) does not belong to the critical region, we conclude that the experiment does not provide enough evidence against the threshold pain being equally variable for both genres.

6.4 Asymptotic tests

Assume that we want to test the hypotheses

  1. \(H_0:\theta=\theta_0\) vs. \(H_1:\theta>\theta_0;\)
  2. \(H_0:\theta=\theta_0\) vs. \(H_1:\theta<\theta_0;\)
  3. \(H_0:\theta=\theta_0\) vs. \(H_1:\theta\neq \theta_0.\)

If we know a test statistic that, under \(H_0,\) has an asymptotic normal distribution, that is \[ Z=\frac{\hat\theta-\theta_0}{\hat\sigma(\hat\theta)}\stackrel{d}{\longrightarrow} \mathcal{N}(0,1), \] then the asymptotic critical regions are given by \[ C_a=\{Z>z_{\alpha}\}, \quad C_b=\{Z<-z_{\alpha}\}, \quad C_c=\{|Z|>z_{\alpha/2}\}. \]

Example 6.8 Let \((X_1,\ldots,X_n)\) be a s.r.s. of a r.v. \(X\) with mean \(\mu\) and variance \(\sigma^2,\) both unknown. We want to test:

  1. \(H_0:\mu=\mu_0 \quad \text{vs.}\quad \mu\geq \mu_0;\)
  2. \(H_0:\mu=\mu_0 \quad \text{vs.}\quad \mu\leq \mu_0;\)
  3. \(H_0:\mu=\mu_0 \quad \text{vs.}\quad \mu\neq \mu_0.\)

For that, employing the CLT (Theorem 2.5) we know that under \(H_0:\mu=\mu_0,\)

\[ Z=\frac{\bar{X}-\mu_0}{S'/\sqrt{n}}\stackrel{d}{\longrightarrow} \mathcal{N}(0,1). \]

Therefore, \(Z\) is a test statistic and \(H_0\) is rejected if the observed value of \(Z\) belongs to the corresponding critical region (\(C_a,\) \(C_b,\) or \(C_c\)).

Example 6.9 A certain machine has to be repaired if more than the \(10\%\) of the articles that it produces per day are defective. A s.r.s. of \(n=100\) articles of the daily production contains \(15\) that are defective and the foreman decides that the machine has to be repaired. Is the sample supporting his decision at a significance level \(\alpha=0.01\)?

Let \(Y\) be the number of defective articles that were found. Then \(Y\sim\mathrm{Bi}(n,p).\) We want to test \[ H_0: p=0.10\quad \text{vs.}\quad H_1:p>0.10. \]

Because of the CLT (Theorem 2.5), \(Y\) has a normal asymptotic distribution, so under \(H_0:p=p_0\) it follows that \[ Z=\frac{\hat p-p_0}{\sqrt{p_0(1- p_0)/n}}\stackrel{d}{\longrightarrow} \mathcal{N}(0,1). \]

Therefore, \(Z\) is a test statistic with observed value \[ Z=\frac{0.15-0.10}{\sqrt{(0.1)(0.9)/100}}=5/3. \]

The rejection region is \[ C=\{Z>z_{0.01}=2.33\}. \]

Since \(Z=5/3\cong1.67<2.33,\) the sample does not provide enough evidence supporting the foreman decision.

6.5 \(p\)-value of a test

Example 6.10 In Example 6.9:

  1. Will the data support the decision of the foreman at the significance level \(\alpha=0.05\)?

    In this case, the critical value of the normal is \(z_{0.05}=1.64\) and the observed value of the statistic was \(Z=5/3=1.6\hat{6}>1.64.\) Therefore, at the level of significance \(\alpha=0.05\) it is concluded that there is evidence in favor of the foreman decision.

  2. What would be the lowest significance level for which the data will support the foreman decision and, therefore, \(H_0: p=0.10\) would be rejected?

    The lowest significance level for which we would reject \(H_0\) is \[ \mathbb{P}(Z>5/3|p=0.10)=0.0485. \]

    This probability is the so-called \(p\)-value of the test. Besides, the rejection decision can be expressed in terms of the \(p\)-value, since

    \[\begin{align*} p\text{-value}<\alpha & \implies Z>z_{\alpha} \implies \text{Reject $H_0,$}\\ p\text{-value}\geq\alpha & \implies Z\leq z_{\alpha} \implies \text{Do not reject $H_0.$} \end{align*}\]

Definition 6.5 (\(p\)-value) The \(p\)-value of an hypothesis test is defined as the lowest significance level for which the test would reject the null hypothesis.

Then, depending on the kind of hypothesis to test, the \(p\)-value is computed in a different way. We differentiate three cases:

  1. One-sided right tests; \(H_0: \theta=\theta_0\) vs. \(H_1:\theta>\theta_0.\) Assume that the test statistic is \(T(X_1,\ldots,X_n)\) and the critical region is \[ C=\{(x_1,\ldots,x_n):T(x_1,\ldots,x_n)>k\}. \] If the observed value of the test statistic is \(T=t,\) then the \(p\)-value sis computed as \[ \text{$p$-value}=\mathbb{P}(T\geq t|\theta=\theta_0). \]

  2. One-sided left tests; \(H_0: \theta=\theta_0\) vs. \(H_1:\theta<\theta_0.\) In this case the critical region is \[ C=\{(x_1,\ldots,x_n):T(x_1,\ldots,x_n)<k\} \] and, therefore, the \(p\)-value is \[ \text{$p$-value}=\mathbb{P}(T\leq t|\theta=\theta_0). \]

  3. Two-sided tests; \(H_0: \theta=\theta_0\) vs. \(H_1:\theta\neq\theta_0.\) In this case, the critical region is of the form \[ C=\{(x_1,\ldots,x_n):T(x_1,\ldots,x_n)<k_1\ \text{and}\ T(x_1,\ldots,x_n)>k_2\}. \] The \(p\)-value is given by \[ \text{$p$-value}=\min\left\{2\mathbb{P}(T\leq t|\theta=\theta_0),2\mathbb{P}(T\geq t|\theta=\theta_0)\right\}. \] Notice that if the distribution of the test statistic is symmetric, such as the normal or Student’s \(t\) distribution, the minimum is not required as both probabilities equal and hence \[ \text{$p$-value}=2\mathbb{P}(T\leq t|\theta=\theta_0)=2\mathbb{P}(T\geq t|\theta=\theta_0). \]

Example 6.11 Assume that in Example 6.2 it has been observed that \(Y=3\) of the \(n=15\) sampled voters support the candidate. Would that result indicate that the candidate is going to lose the elections (reject \(H_0:p=0.5\)) at significance level \(\alpha=0.05\)?

The hypothesis to test is

\[ H_0:p=0.5\quad \text{vs.}\quad H_1:p<0.5. \]

Since under \(H_0:p=0.5,\) \(Y\sim \mathrm{Bin}(n,0.5),\) then the \(p\)-value is given by

\[\begin{align*} \text{$p$-value} &=\mathbb{P}(Y\leq 3|p=0.5)=\sum_{y=0}^3 \left(\begin{array}{c}15\\ y\end{array}\right)(0.5)^{15} \\ &=(0.5)^{15}\left[1+15+\frac{(15)(14)}{2}+\frac{(15)(14)(13)}{6}\right]\\ &\cong0.018<\alpha=0.05. \end{align*}\]

Equivalently, it can be computed as:

pbinom(3, size = 15, prob = 0.5)
## [1] 0.01757813

Therefore, \(H_0: p=0.5\) is rejected in favor of \(H_1:p<0.5;\) that is, this result indicates that the candidate will not win the elections.

Example 6.12 It is estimated that a particular flight is profitable if the average occupation rate during a year is at least a \(60\%.\) An airline is interested in determining whether it is profitable to keep a particular flight operative. For that, they record the occupation rates of \(120\) random flights scattered around the year, resulting a mean occupation rate of \(58\%\) and a standard deviation of \(11\%.\) Considering that the occupation rates (in proportion) have an approximate normal distribution, is there enough evidence to cancel the flight because it is not profitable? Employ a significance level of \(\alpha=0.10.\)

Let \(\mu\) be the average occupation rate of the flight in one year. It is desired to test \[ H_0:\mu=0.6\quad \text{vs.}\quad H_1:\mu<0.6. \]

The test statistic is \[ T=\frac{\bar{X}-0.6}{S'/\sqrt{n}}=\frac{0.58-0.6}{0.11/\sqrt{120}}=-1.992. \]

Under \(H_0:\mu=0.6,\) the statistic is distributed as a \(t_{119}\):

\[ \mathbb{P}(T\leq -1.992|\mu=0.6)\cong0.0233<\alpha=0.10. \]

The last probability can be computed as

pt(-1.992, df = 199)
## [1] 0.02386912

Therefore, \(H_0:\mu=0.6\) is rejected, that is, the sample indicates that the flight is not profitable.

6.6 Power of a test and Neyman–Pearson’s Lemma

Definition 6.6 (Power function) The power function of a test is the function \(\omega:\Theta\rightarrow[0,1]\) that gives the probability of rejecting \(H_0\) for a particular value of the parameter\(\theta\in\Theta,\) that is, \[ \omega(\theta)=\mathbb{P}(\text{Reject $H_0$}|\theta). \] Recall that for \(\theta=\theta_0\in\Theta_0,\) the power equals the significance level: \[ \omega(\theta_0)=\mathbb{P}(\text{Reject $H_0$}|\theta_0)=\alpha. \] In addition, for any value \(\theta=\theta_1\in\Theta_1,\) \[ \omega(\theta_1)=\mathbb{P}(\text{Reject $H_0$}|\theta_1)=1-\mathbb{P}(\text{Do not reject $H_0$}|\theta_1)=1-\beta(\theta_1), \] that is, the power in \(\theta_1\) equals the complementary of the type II error probability for \(\theta_1.\)

The usual criterion for selecting among several types of tests for the same hypothesis consists in fixing the type I error probability, \(\alpha,\) and then select among the tests with the same type I error the one that presents the highest power for all \(\theta_1\in\Theta_1.\) This is the so-called Uniformly Most Powerful (UMP) test. However, the UMP test does not always exists.

The Neyman–Pearson’s Lemma guarantees the existence of the UMP test for simple null hypothesis and provides with the form of the critical region for such test.

Theorem 6.1 (Neyman–Pearson’s Lemma) Let \(X\) be a r.v. with a distribution dependent of an unknown parameter \(\theta.\) Assume that it is desired to test \[ H_0:\theta=\theta_0\quad \text{vs.}\quad H_1:\theta=\theta_1 \] using the information of a s.r.s. \((X_1,\ldots,X_n)\) of \(X.\) For the significance level \(\alpha,\) the test that maximizes the power in \(\theta_1\) has a critical region of the form \[ C=\left\{(x_1,\ldots,x_n):\frac{\mathcal{L}(\theta_0;x_1,\ldots,x_n)}{\mathcal{L}(\theta_1;x_1,\ldots,x_n)}<k\right\}. \]

Recall that the lemma specifies only the form of the critical region, but not the specific value of \(k.\) However, \(k\) can be easily computed from the significance level \(\alpha\) and the distribution of \(X\) under \(H_0.\)

Example 6.13 Assume that \(X\) represents a single observation of the r.v. with p.d.f. \[ f(x;\theta)=\left\{\begin{array}{ll} \theta x^{\theta-1}& 0<x<1,\\ 0 & \text{otherwise}. \end{array}\right. \]

Find the UMP test at a significance level \(\alpha=0.05\) for testing \[ H_0: \theta=1\quad \text{vs.} \quad H_1:\theta=2. \]

Let’s employing Theorem 6.1. In this case we have a s.r.s. of size one. Then, the likelihood is \[ \mathcal{L}(\theta;x)=\theta x^{\theta-1}. \] For computing the critical region of the UMP test, we obtain \[ \frac{\mathcal{L}(\theta_0;x)}{\mathcal{L}(\theta_1;x)}=\frac{\mathcal{L}(1;x)}{\mathcal{L}(2;x)}=\frac{1}{2x} \] and therefore \[ C=\left\{x:\frac{1}{2x}<k\right\}=\left\{x>\frac{1}{2k}\right\}={x>k'}. \]

The value of \(k'\) can be determined from the significance level \(\alpha,\)

\[\begin{align*} \alpha&=\mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})=\mathbb{P}(x>k'|\theta=1)\\ &=\int_{k'}^1 f(x;1)\,\mathrm{d}x=\int_{k'}^{1}\,\mathrm{d}x=1-k'. \end{align*}\]

Therefore, \(k'=1-\alpha,\) so the critical region of UMP test of size \(\alpha\) is

\[ C=\{x:x>1-\alpha\}. \]

When testing one-sided hypothesis of the type \[\begin{align} H_0:\theta=\theta_0\quad \text{vs.}\quad H_1:\theta>\theta_0,\tag{6.3} \end{align}\] the Neyman–Pearson’s Lemma is not applicable.

However, if we fix a value \(\theta_1>\theta_0\) and we compute the critical region of the UMP test for \[ H_0:\theta=\theta_0\quad \text{vs.}\quad H_1:\theta=\theta_1, \] quite often the critical region obtained does not depend on the value \(\theta_1.\) Therefore, this very same test is the UMP test for testing (6.3)!

In addition, if we have an UMP test for \[ H_0:\theta=\theta_0\quad \text{vs.}\quad H_1:\theta>\theta_0, \]

then the same test is also the UMP test for

\[ H_0:\theta=\theta_0'\quad \text{vs.}\quad H_1:\theta>\theta_0, \] since for any value \(\theta_0'<\theta_0,\) any other test will have larger errors of the two types.

6.7 The likelihood ratio test

Definition 6.7 (Likelihood ratio test) Let be the composite hypotheses \[ H_0:\theta\in\Theta_0\quad \text{vs.}\quad H_1:\theta\in\Theta_1, \] where \(\Theta_0\) and \(\Theta_1\) are such that \(\Theta=\Theta_0\cup\Theta_1.\) Let \(\mathcal{L}(\hat\Theta_0;x_1,\ldots,x_n)\) be the maximum likelihood attained inside \(\Theta_0,\) that is, \[ \mathcal{L}(\hat\Theta_0;x_1,\ldots,x_n)=\sup_{\theta\in\Theta_0}\mathcal{L}(\theta;x_1,\ldots,x_n), \] and, with the same notation, let \(\mathcal{L}(\hat\Theta;x_1,\ldots,x_n)\) be the maximum likelihood attained inside \(\Theta.\) The likelihood ratio test is defined as the test with critical region of the form \[ C=\left\{\frac{\mathcal{L}(\hat\Theta_0;x_1,\ldots,x_n)}{\mathcal{L}(\hat\Theta;x_1,\ldots,x_n)}\leq k\right\}. \] The quantity \[ \lambda=\frac{\mathcal{L}(\hat\Theta_0;x_1,\ldots,x_n)}{\mathcal{L}(\hat\Theta;x_1,\ldots,x_n)} \] is referred as the likelihood ratio statistic.

Theorem 6.2 (Asymptotic distribution of the likelihood ratio statistic) Under certain regularity conditions about the distribution of \(X,\) dependent on an unknown parameter \(\theta,\) and \(H_0,\) it is verified that \[ -2\log\lambda\stackrel{d}{\longrightarrow} \chi_p^2, \] where \(p=\dim\Theta-\dim\Theta_0\) is the number of specified parameters under \(H_0.\)

Example 6.14 A labor union registers the number of complains that are filled per week by the workers of two different shifts in the production line of a factory. One hundred independent observations about the number of complains for both shifts gave the means \(\bar{X}=20\) and \(\bar{Y}=22.\) Assume that the number of complains per week of the \(i\)-th shift has a \(\mathrm{Pois}(\theta_i)\) distribution, \(i=1,2.\) The labour union wants to test if the average number of complaints per week of both shifts is significatively different or not at a significance level \(\alpha=0.01.\)

The hypothesis to test is

\[ H_0:\theta_1=\theta_2\Leftrightarrow \theta_1-\theta_2=0\quad \text{vs.} \quad H_1:\theta_1\neq \theta_2 \Leftrightarrow \theta_1-\theta_2\neq 0. \]

Recall that the parametric space is

\[ \Theta=\{(\theta_1,\theta_2)\in\mathbb{R}^2: \theta_1\geq 0,\theta_2\geq 0\}, \]

and that the space \(\Theta_0\) that determines the null hypothesis is

\[ \Theta_0=\{(\theta_1,\theta_2)\in\mathbb{R}^2: \theta_1=\theta_2\}, \]

so the number of specified parameters is \(p=2-1=1.\)

The p.m.f. of the Poisson of mean \(\theta_i\) is

\[ p(x;\theta_i)=\frac{\theta_i^x e^{-\theta_i}}{x!},\quad x=0,1,2,\ldots \]

Two samples of size \(n=100\) were observed, one for each shift, and we denote them by

\[ \mathbf{x}=(x_1,\ldots,x_{100}),\quad \mathbf{y}=(y_1,\ldots,y_{100}). \]

The joint likelihood is

\[ \mathcal{L}(\theta_1,\theta_2;\mathbf{x},\mathbf{y})=\frac{\theta_1^{\sum_{i=1}^{100}x_i}\theta_2^{\sum_{i=1}^{100}y_i}e^{-n(\theta_1+\theta_2)}}{x_1!\ldots x_n!y_1!\ldots y_n!}. \]

We compute the maximum likelihood in \(\Theta_0.\) This is attained at the MLE restricted to \(\Theta_0.\) But, under \(H_0:\theta_1=\theta_2\triangleq \theta,\) the likelihood is

\[ \mathcal{L}(\theta;\mathbf{x},\mathbf{y})=\frac{\theta^{\sum_{i=1}^{100}x_i+\sum_{i=1}^{100}y_i}e^{-2n\theta}}{c}, \]

where \(c\) denotes a constant that does not depend on the parameters. Computing the maximum of this likelihood, we obtain the MLE restricted to \(\Theta_0.\) Taking the log-likelihood

\[ \ell(\theta;\mathbf{x},\mathbf{y})=\left(\sum_{i=1}^{100}x_i+\sum_{i=1}^{100}y_i\right)\log\theta -2n\theta-\log c \]

and differentiating and equating to zero, we have

\[ \hat\theta=\frac{\sum_{i=1}^{100}x_i+\sum_{i=1}^{100}y_i}{2n}=\frac{1}{2}(\bar{x}+\bar{y}). \]

The unrestricted MLE’s of \(\theta_1\) and \(\theta_2\) are obtained from the unrestricted log-likelihood, which is

\[ \ell(\theta_1,\theta_2;\mathbf{x},\mathbf{y})=\left(\sum_{i=1}^{100} x_i\right) \log\theta_1+\left(\sum_{i=1}^{100} y_i\right) \log\theta_2-n(\theta_1+\theta_2)-\log c. \]

Differentiating and equating to zero, we have

\[ \hat\theta_1=\bar{x},\quad \hat\theta_2=\bar{y}. \]

Then, the maximum likelihoods in \(\Theta_0\) and \(\Theta\) are, respectively,

\[\begin{align*} \mathcal{L}(\hat\Theta_0;\mathbf{x},\mathbf{y}) &=\frac{\left[\frac{1}{2}(\bar{x}+\bar{y})\right]^{n(\bar{x}+\bar{y})}e^{-2n(\bar{x}+\bar{y})/2}}{c},\\ \mathcal{L}(\hat\Theta;\mathbf{x},\mathbf{y})&=\frac{\bar{x}^{n\bar{x}}\bar y^{\bar y}e^{-n(\bar{x}+\bar{y})}}{c}. \end{align*}\]

Therefore, the likelihood ratio statistic is

\[ \lambda=\frac{\mathcal{L}(\hat\Theta_0;\mathbf{x},\mathbf{y})}{\mathcal{L}(\hat\Theta;\mathbf{x},\mathbf{y})}=\frac{\left[\frac{1}{2}(\bar{x}+\bar{y})\right]^{n(\bar{x}+\bar{y})}}{\bar{x}^{n\bar{x}}\bar{y}^{n\bar{y}}}. \]

Substituting the observed means, we obtain

\[ \lambda=\frac{\left[\frac{1}{2}(20+22)\right]^{100(20+22)}}{20^{100(20)}22^{100(22)}}. \]

Then, \(-2\log\lambda=9.53>\chi_{1;0.01}^2=6.635.\) Therefore, \(H_0:\theta_1=\theta_2\) is rejected, that is, the data indicate that the average of complains in both shifts differ significantly. In addition, since the \(p\)-value of the test is

pchisq(9.53, df = 1, lower.tail = FALSE)
## [1] 0.002021401

we will also reject for any significance level \(\alpha>0.002.\)


Exercise 6.1 The type I error probability \(\alpha\) in the Examples 6.2 and 6.3 was very small, but however the type II error probability \(\beta\) was too large. This was due to the narrowness of the critical region. Given the critical region \(C=\{y\leq 5\},\) compute the new \(\alpha\) and \(\beta\) for \(p=0.3\) and compare them with the previous values obtained in the examples.

Exercise 6.2 A chemical process has produced, until the previous week, on average, \(800\) tons of a chemical product per day. The daily productions this week were \(785,\) \(805,\) \(790,\) \(793,\) and \(802\) tons. Is this data indicating that the average production has gone below the \(800\) tons? Answer by carring out the adequate hypothesis test at significance level \(\alpha=0.05\) assuming that the daily productions follow a normal distribution normal.

Exercise 6.3 A poll revealed that the yearly gross wage of research professionals increased an \(8\%\) during the year 2000 until attaining an average of \(18000\) euros. However, a s.r.s. of \(20\) physicists resulted in an average of \(19000\) euros with a quasistandard deviation of \(1500\) euros. Are physicists better paid? Compute the \(p\)-value of the hypothesis test at the significance level \(\alpha=0.05.\)

Exercise 6.4 A study was conducted by the Florida Fish and Wildlife Conservation Commission for estimating the quantity of DDT found in the brain tissues of brown pelicans. Samples of \(n_1=10\) young pelicans and \(n_2=13\) baby pelicans gave the results indicated in the table below (measurements in parts per million).

Young pelicans Baby pelicans
\(n_1=10\) \(n_2=13\)
\(\bar{X}_1=0.041\) \(\bar{X}_2=0.026\)
\(S_1'^2=210\) \(S_2'^2=0.006\)
  1. Test the hypothesis of no difference in the average DDT quantities found in young and baby pelicans against the alternative that the young pelicans show a higher average, with \(\alpha=0.05.\)
  2. Is there evidence that the average quantity of DDT for young pelicans is larger than the one for baby pelicans in more than \(0.01\) parts per million?

Exercise 6.5 A researcher is convinced that its measuring device has a variability that with a standard deviation of \(2.\) Seventeen measurements gave as a result \(S'^2=6.1.\) Assuming normality, does the data agree or disagree with his believe? Determine the \(p\)-value of the test at the significance level \(\alpha=0.05.\)

Exercise 6.6 A local brewery pretends to buy a new bottling machime and considers two models, A and B, manufactured by different companies. The decisive factor for buying one mode or other is the variability in the filling of the bottles, being the machine with a lower variance the preferred one. To infer the machines’ filling variances, \(10\) bottles are filled in each one, obtaining the results of the table below. Perform the three types of tests (right one-sided, left one-sided, and two-sided) at the conficence level \(\alpha=0.05\) for the null hypothesis of equality of variances of the two machines. What conclusions are extracted?

\(n_A=10\) \(n_B=10\)
\(\bar{X}_A=0.9\) \(\bar{X}_B=0.93\)
\(S_A'^2=0.04\) \(S_B'^2=0.03\)

Exercise 6.7 The closing prices of two common stock prices were recorded during a period of \(16\) days. The means and quasivariances were \[ \begin{array}{ll} \bar{X}_1=40.33, & \bar{X}_2=42.54, \\ S_1'^2=1.54, & S_2'^2=2.96. \end{array} \] Is this data showing enough evidence indicating a difference in variability for the closing prices of the two stocks? Determine the \(p\)-value of the test. What would be concluded with \(\alpha=0.02\)? Assume normality.

Exercise 6.8 A psychological study was carried out in order to compare the response times (in seconds) of men and women with respect to a certain stimulus. Two independent s.r.s.’s of \(50\) men and \(50\) women were employed in the experiment. The results are shown in the table. Is the data showing enough evidence for suggessting a difference between the average response for men and women? Employ \(\alpha=0.05.\)

Men Women
\(n_1=50\) \(n_2=50\)
\(\bar{X}_1=3.6\) \(\bar{X}_2=3.8\)
\(S_1'^2=0.18\) \(S_2'^2=0.14\)

Exercise 6.9 The daily salaries of a particular industry have a normal distribution with mean \(\mu=13.20\) (in dollars). If in that industry there is a company of \(n=40\) employees that pays an average of \(12.20\) per day and has a quasistandard deviation on the salaries of \(S'=2.5\$,\) can this company be accused of systematically paying inferior salaries? Employ the significance level \(\alpha=0.01.\)

  1. For the so-called two-sided tests, to be introduced in Section 6.2, \(\Theta_0=\{\theta_0\},\) \(\Theta_1=\Theta_0^c,\) and \(\Theta_2=\emptyset.\) However, for the so-called one-sided tests, \(\Theta_0=\{\theta_0\},\) \(\Theta_1=(\theta_0,+\infty),\) and \(\Theta_2=(-\infty, \theta_0)\) (or \(\Theta_1=(-\infty,\theta_0)\) and \(\Theta_2=(\theta_0,+\infty)\)).↩︎