6.3 Tests on two normal populations

We assume now two populations represented as two independent rv’s \(X_1\sim\mathcal{N}(\mu_1,\sigma_1^2)\) and \(X_2\sim\mathcal{N}(\mu_2,\sigma_2^2),\) with unknown means and variances. From two srs’s \((X_{11},\ldots,X_{1n_1})\) and \((X_{21},\ldots,X_{2n_2})\) of \(X_1\) and \(X_2,\) we will test hypotheses about the difference of means \(\mu_1-\mu_2,\) assuming \(\sigma_1^2=\sigma_2^2,\) and about the ratio of variances \(\sigma_1^2/\sigma_2^2.\) As in Section 6.2, the sampling distributions obtained in Section 2.2 for normal populations will be key for obtaining the critical regions of the forthcoming tests.

6.3.1 Equality of means

We assume that \(\sigma_1^2=\sigma_2^2=\sigma^2.\) The hypotheses to test are of three types:

  1. \(H_0:\mu_1=\mu_2\) vs. \(H_1:\mu_1>\mu_2;\)
  2. \(H_0:\mu_1=\mu_2\) vs. \(H_1:\mu_1<\mu_2;\)
  3. \(H_0:\mu_1=\mu_2\) vs. \(H_1:\mu_1\neq \mu_2.\)

Denoting \(\theta:=\mu_1-\mu_2,\) then the hypotheses can be rewritten as:

  1. \(H_0:\theta=0\) vs. \(H_1:\theta>0;\)
  2. \(H_0:\theta=0\) vs. \(H_1:\theta<0;\)
  3. \(H_0:\theta=0\) vs. \(H_1:\theta\neq 0.\)

An estimator of \(\theta\) is the difference of sample means,

\[\begin{align*} \hat{\theta}=\bar{X}_1-\bar{X}_2\sim \mathcal{N}\left(\mu_1-\mu_2,\sigma^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)\right). \end{align*}\]

If we estimate \(\sigma^2\) using

\[\begin{align*} S^2=\frac{(n_1-1)S_1'^2+(n_2-1)S_2'^2}{n_1+n_2-2}, \end{align*}\]

then an adequate test statistic is

\[\begin{align*} T=\frac{\bar{X}_1-\bar{X}_2}{S\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\stackrel{H_0}\sim t_{n_1+n_2-2}. \end{align*}\]

It does not take much to realize that the critical regions can be completely recycled from that in Section 6.2.1. Therefore, the critical regions are:

  1. \(C_a=\{T>t_{n_1+n_2-2;\alpha}\};\)
  2. \(C_b=\{T<-t_{n_1+n_2-2;\alpha}\};\)
  3. \(C_c=\{|T|>t_{n_1+n_2-2;\alpha/2}\}.\)

Example 6.8 Is there any evidence that any of the two training methods described in Example 5.5 works better with \(\alpha=0.05\)? The average assembly times for the two groups of nine employees were \(\bar{X}_1\approx35.22\) and \(\bar{X}_2\approx31.56,\) and the quasivariances \(S_1'^2\approx24.445\) and \(S_2'^2\approx20.027.\)

We want to test

\[\begin{align*} H_0:\mu_1=\mu_2\quad \text{vs.}\quad H_1:\mu_1\neq \mu_2. \end{align*}\]

The observed value of the test statistic follows from the pooled estimation of the variance,

\[\begin{align*} S^2\approx\frac{(9-1)\times 24.445+(9-1)\times 20.027}{9+9-2}\approx22.24, \end{align*}\]

which provides

\[\begin{align*} T\approx\frac{35.22-31.56}{4.71\sqrt{\frac{1}{9}+\frac{1}{9}}}\approx1.65. \end{align*}\]

Then, the critical region is \(C=\{|T|>t_{16;0.025}\approx2.12\}.\) Since \(T\approx1.65<2.12,\) that is, the statistic does not belong to either of the two parts of the critical region. It is concluded that the data does not provide evidence supporting that any of the two methods works better.

The R function t.test() implements the (two-sample) test of \(H_0:\mu_1=\mu_2\) against different alternatives. The main arguments of the function are as follows:

t.test(x, y, alternative = c("two.sided", "less", "greater"),
       var.equal = FALSE, paired = FALSE, ...)

The flag var.equal indicates if \(\sigma_1^2=\sigma_2^2.\) The table below shows the encoding of the alternative argument:

alternative "two.sided" "less" "greater"
\(H_1\) \(\mu_1\neq\mu_2\) \(\mu_1<\mu_2\) \(\mu_1>\mu_2\)

Remark. The paired argument serves to indicate if the srs’s \((X_{11},\ldots,X_{1n_1})\) and \((X_{21},\ldots,X_{2n_2})\) are paired. That is, if \(n_1=n_2\) and both samples are actually dependent between them because they correspond to measurements in the same individuals:

\(X_1\) \(X_2\) \(Y:=X_1-X_2\)
\(X_{11}\) \(X_{21}\) \(Y_1:=X_{11}-X_{21}\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(X_{1n}\) \(X_{2n}\) \(Y_n:=X_{1n}-X_{2n}\)

In this case, paired = TRUE is the same as testing \(H_0:\mu_Y=0\) with the srs \((Y_1,\ldots,Y_n)\) (i.e., we are under the setting of Section 6.2.1).78 The prototypical example of a paired test is the measurement of a certain characteristic (e.g., blood pressure) of a group of patients before and after a drug is administrated.

Example 6.9 The t.test() solution to Example 6.8 is very simple:

# Apply t.test() with equal variances and H1: mu1 != mu2
std <- c(32, 37, 35, 28, 41, 44, 35, 31, 34)
new <- c(35, 31, 29, 25, 34, 40, 27, 32, 31)
t.test(x = std, y = new, alternative = "two.sided", var.equal = TRUE,
       paired = FALSE)
## 
##  Two Sample t-test
## 
## data:  std and new
## t = 1.6495, df = 16, p-value = 0.1185
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.045706  8.379039
## sample estimates:
## mean of x mean of y 
##  35.22222  31.55556

That the reported \(p\)-value is larger than \(\alpha=0.05\) indicates non-rejection of \(H_0,\) as seen in Section 6.5.

6.3.2 Equality of variances

We want to test the following hypotheses:

  1. \(H_0:\sigma_1^2=\sigma_2^2\) vs. \(H_1:\sigma_1^2>\sigma_2^2;\)
  2. \(H_0:\sigma_1^2=\sigma_2^2\) vs. \(H_1:\sigma_1^2<\sigma_2^2;\)
  3. \(H_0:\sigma_1^2=\sigma_2^2\) vs. \(H_1:\sigma_1^2\neq\sigma_2^2.\)

Denoting \(\theta:=\sigma_1^2/\sigma_2^2,\) then the hypotheses can be rewritten as:

  1. \(H_0:\theta=1\) vs. \(H_1:\theta>1;\)
  2. \(H_0:\theta=1\) vs. \(H_1:\theta<1;\)
  3. \(H_0:\theta=1\) vs. \(H_1:\theta\neq 1.\)

An estimator of \(\theta\) is \(\hat{\theta}=S_1'^2/S_2'^2,\) but its distribution is unknown as it will depend on \(\sigma_1^2\) and \(\sigma_2^2.\) However, we do know the distribution of

\[\begin{align*} F=\frac{\frac{(n_1-1)S_1'^2}{\sigma_1^2}/(n_1-1)}{\frac{(n_2-1)S_2'^2}{\sigma_2^2}/(n_2-1)}=\frac{S_1'^2 /\sigma_1^2}{S_2'^2/\sigma_2^2}\sim \mathcal{F}_{n_1-1,n_2-1}. \end{align*}\]

Besides, under \(H_0:\sigma_1^2=\sigma_2^2,\)

\[\begin{align*} F=\frac{S_1'^2}{S_2'^2}\stackrel{H_0}{\sim} \mathcal{F}_{n_1-1,n_2-1}, \end{align*}\]

so \(F\) is a test statistic. The rejection regions are given by:

  1. \(C_a=\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:F(x_1,\ldots,x_n)>\mathcal{F}_{n_1-1,n_2-1;\alpha}\};\)
  2. \(C_b=\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:F(x_1,\ldots,x_n)<\mathcal{F}_{n_1-1,n_2-1;1-\alpha}\};\)
  3. \(C_c=\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:F(x_1,\ldots,x_n)>\mathcal{F}_{n_1-1,n_2-1;\alpha/2} \ \text{or}\ \ F(x_1,\ldots,x_n)<\mathcal{F}_{n_1-1,n_2-1;1-\alpha/2}\}.\)

Example 6.10 An experiment for studying the pain threshold consists in applying small electric shocks to \(14\) men and \(12\) women and recording their pain thresholds. The experiment provides the following data:

  • Men: 16, 13.4, 17.7, 10.2, 13.1, 15.4, 15.9, 11.9, 13.9, 15.5, 15.9, 12.5, 16.5, 16.5.
  • Women: 5.8, 6.4, 13.1, 7.2, 12.8, 9.8, 10.5, 18.9, 13.7, 13.7, 9.8, 11.5.

Assuming that the variable that measures the threshold pain for men and women is normally distributed, is there evidence of a different variability in the threshold pain between men and women at significance level \(\alpha=0.05\)?

We want to test

\[\begin{align*} H_0:\sigma_\mathrm{M}^2=\sigma_\mathrm{W}^2\quad \text{vs.}\quad H_1: \sigma_\mathrm{M}^2\neq \sigma_\mathrm{W}^2. \end{align*}\]

The test statistic is

\[\begin{align*} F=\frac{S_\mathrm{M}'^2}{S_\mathrm{W}'^2}\approx\frac{4.5277}{13.6855}\approx 0.3308. \end{align*}\]

The critical region is

\[\begin{align*} C=\{F>\mathcal{F}_{13,11;0.025}\ \text{or}\ F<\mathcal{F}_{13,11;0.975}\}. \end{align*}\]

\(\mathcal{F}_{13,11;0.025}\) and \(\mathcal{F}_{13,11;0.975}\) are computed in R as follows:

qf(0.025, df1 = 13, df2 = 11, lower.tail = FALSE)
## [1] 3.391728
qf(0.975, df1 = 13, df2 = 11, lower.tail = FALSE)
## [1] 0.3127447

Since \(F=0.3308\) does not belong to the critical region, we conclude that the experiment does not provide enough evidence against the threshold pain being equally variable for both genders.

The R function var.test() implements the (two-sample) test of \(H_0:\sigma^2_1=\sigma^2_2\) against different alternatives. The main arguments of the function are as follows:

var.test(x, y, ratio = 1,
         alternative = c("two.sided", "less", "greater"), ...)

The table below shows the encoding of the alternative argument:

alternative "two.sided" "less" "greater"
\(H_1\) \(\sigma_1^2\neq\sigma_2^2\) \(\sigma_1^2<\sigma_2^2\) \(\sigma_1^2>\sigma_2^2\)

Example 6.11 The var.test() solution to Example 6.10 is very simple:

# Apply var.test() with H1: sigma_1^2 != sigma_2^2
men <- c(16.0, 13.4, 17.7, 10.2, 13.1, 15.4, 15.9, 11.9, 13.9, 15.5, 15.9,
         12.5, 16.5, 16.5)
wom <- c(5.8, 6.4, 13.1, 7.2, 12.8, 9.8, 10.5, 18.9, 13.7, 13.7, 9.8, 11.5)
var.test(x = men, y = wom, alternative = "two.sided")
## 
##  F test to compare two variances
## 
## data:  men and wom
## F = 0.33084, num df = 13, denom df = 11, p-value = 0.06162
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.09754312 1.05785883
## sample estimates:
## ratio of variances 
##          0.3308397

That the reported \(p\)-value is larger than \(\alpha=0.05\) indicates non-rejection of \(H_0,\) as seen in Section 6.5.


  1. A good exercise is to check that this statement is true using and that the outcomes of t.test(x, y, paired = TRUE) and t.test(x - y, mu = 0) are equivalent.↩︎