6.3 Tests on two normal populations

We assume now two populations represented as two independent rv’s X1N(μ1,σ21) and X2N(μ2,σ22), with unknown means and variances. From two srs’s (X11,,X1n1) and (X21,,X2n2) of X1 and X2, we will test hypotheses about the difference of means μ1μ2, assuming σ21=σ22, and about the ratio of variances σ21/σ22. As in Section 6.2, the sampling distributions obtained in Section 2.2 for normal populations will be key for obtaining the critical regions of the forthcoming tests.

6.3.1 Equality of means

We assume that σ21=σ22=σ2. The hypotheses to test are of three types:

  1. H0:μ1=μ2 vs. H1:μ1>μ2;
  2. H0:μ1=μ2 vs. H1:μ1<μ2;
  3. H0:μ1=μ2 vs. H1:μ1μ2.

Denoting θ:=μ1μ2, then the hypotheses can be rewritten as:

  1. H0:θ=0 vs. H1:θ>0;
  2. H0:θ=0 vs. H1:θ<0;
  3. H0:θ=0 vs. H1:θ0.

An estimator of θ is the difference of sample means,

ˆθ=ˉX1ˉX2N(μ1μ2,σ2(1n1+1n2)).

If we estimate σ2 using

S2=(n11)S21+(n21)S22n1+n22,

then an adequate test statistic is

T=ˉX1ˉX2S1n1+1n2H0tn1+n22.

It does not take much to realize that the critical regions can be completely recycled from that in Section 6.2.1. Therefore, the critical regions are:

  1. Ca={T>tn1+n22;α};
  2. Cb={T<tn1+n22;α};
  3. Cc={|T|>tn1+n22;α/2}.

Example 6.8 Is there any evidence that any of the two training methods described in Example 5.5 works better with α=0.05? The average assembly times for the two groups of nine employees were ˉX135.22 and ˉX231.56, and the quasivariances S2124.445 and S2220.027.

We want to test

H0:μ1=μ2vs.H1:μ1μ2.

The observed value of the test statistic follows from the pooled estimation of the variance,

S2(91)×24.445+(91)×20.0279+9222.24,

which provides

T35.2231.564.7119+191.65.

Then, the critical region is C={|T|>t16;0.0252.12}. Since T1.65<2.12, that is, the statistic does not belong to either of the two parts of the critical region. It is concluded that the data does not provide evidence supporting that any of the two methods works better.

The R function t.test() implements the (two-sample) test of H0:μ1=μ2 against different alternatives. The main arguments of the function are as follows:

t.test(x, y, alternative = c("two.sided", "less", "greater"),
       var.equal = FALSE, paired = FALSE, ...)

The flag var.equal indicates if σ21=σ22. The table below shows the encoding of the alternative argument:

alternative "two.sided" "less" "greater"
H1 μ1μ2 μ1<μ2 μ1>μ2

Remark. The paired argument serves to indicate if the srs’s (X11,,X1n1) and (X21,,X2n2) are paired. That is, if n1=n2 and both samples are actually dependent between them because they correspond to measurements in the same individuals:

X1 X2 Y:=X1X2
X11 X21 Y1:=X11X21
X1n X2n Yn:=X1nX2n

In this case, paired = TRUE is the same as testing H0:μY=0 with the srs (Y1,,Yn) (i.e., we are under the setting of Section 6.2.1).78 The prototypical example of a paired test is the measurement of a certain characteristic (e.g., blood pressure) of a group of patients before and after a drug is administrated.

Example 6.9 The t.test() solution to Example 6.8 is very simple:

# Apply t.test() with equal variances and H1: mu1 != mu2
std <- c(32, 37, 35, 28, 41, 44, 35, 31, 34)
new <- c(35, 31, 29, 25, 34, 40, 27, 32, 31)
t.test(x = std, y = new, alternative = "two.sided", var.equal = TRUE,
       paired = FALSE)
## 
##  Two Sample t-test
## 
## data:  std and new
## t = 1.6495, df = 16, p-value = 0.1185
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.045706  8.379039
## sample estimates:
## mean of x mean of y 
##  35.22222  31.55556

That the reported p-value is larger than α=0.05 indicates non-rejection of H0, as seen in Section 6.5.

6.3.2 Equality of variances

We want to test the following hypotheses:

  1. H0:σ21=σ22 vs. H1:σ21>σ22;
  2. H0:σ21=σ22 vs. H1:σ21<σ22;
  3. H0:σ21=σ22 vs. H1:σ21σ22.

Denoting θ:=σ21/σ22, then the hypotheses can be rewritten as:

  1. H0:θ=1 vs. H1:θ>1;
  2. H0:θ=1 vs. H1:θ<1;
  3. H0:θ=1 vs. H1:θ1.

An estimator of θ is ˆθ=S21/S22, but its distribution is unknown as it will depend on σ21 and σ22. However, we do know the distribution of

F=(n11)S21σ21/(n11)(n21)S22σ22/(n21)=S21/σ21S22/σ22Fn11,n21.

Besides, under H0:σ21=σ22,

F=S21S22H0Fn11,n21,

so F is a test statistic. The rejection regions are given by:

  1. Ca={(x1,,xn)Rn:F(x1,,xn)>Fn11,n21;α};
  2. Cb={(x1,,xn)Rn:F(x1,,xn)<Fn11,n21;1α};
  3. Cc={(x1,,xn)Rn:F(x1,,xn)>Fn11,n21;α/2 or  F(x1,,xn)<Fn11,n21;1α/2}.

Example 6.10 An experiment for studying the pain threshold consists in applying small electric shocks to 14 men and 12 women and recording their pain thresholds. The experiment provides the following data:

  • Men: 16, 13.4, 17.7, 10.2, 13.1, 15.4, 15.9, 11.9, 13.9, 15.5, 15.9, 12.5, 16.5, 16.5.
  • Women: 5.8, 6.4, 13.1, 7.2, 12.8, 9.8, 10.5, 18.9, 13.7, 13.7, 9.8, 11.5.

Assuming that the variable that measures the threshold pain for men and women is normally distributed, is there evidence of a different variability in the threshold pain between men and women at significance level α=0.05?

We want to test

H0:σ2M=σ2Wvs.H1:σ2Mσ2W.

The test statistic is

F=S2MS2W4.527713.68550.3308.

The critical region is

C={F>F13,11;0.025 or F<F13,11;0.975}.

F13,11;0.025 and F13,11;0.975 are computed in R as follows:

qf(0.025, df1 = 13, df2 = 11, lower.tail = FALSE)
## [1] 3.391728
qf(0.975, df1 = 13, df2 = 11, lower.tail = FALSE)
## [1] 0.3127447

Since F=0.3308 does not belong to the critical region, we conclude that the experiment does not provide enough evidence against the threshold pain being equally variable for both genders.

The R function var.test() implements the (two-sample) test of H0:σ21=σ22 against different alternatives. The main arguments of the function are as follows:

var.test(x, y, ratio = 1,
         alternative = c("two.sided", "less", "greater"), ...)

The table below shows the encoding of the alternative argument:

alternative "two.sided" "less" "greater"
H1 σ21σ22 σ21<σ22 σ21>σ22

Example 6.11 The var.test() solution to Example 6.10 is very simple:

# Apply var.test() with H1: sigma_1^2 != sigma_2^2
men <- c(16.0, 13.4, 17.7, 10.2, 13.1, 15.4, 15.9, 11.9, 13.9, 15.5, 15.9,
         12.5, 16.5, 16.5)
wom <- c(5.8, 6.4, 13.1, 7.2, 12.8, 9.8, 10.5, 18.9, 13.7, 13.7, 9.8, 11.5)
var.test(x = men, y = wom, alternative = "two.sided")
## 
##  F test to compare two variances
## 
## data:  men and wom
## F = 0.33084, num df = 13, denom df = 11, p-value = 0.06162
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.09754312 1.05785883
## sample estimates:
## ratio of variances 
##          0.3308397

That the reported p-value is larger than α=0.05 indicates non-rejection of H0, as seen in Section 6.5.


  1. A good exercise is to check that this statement is true using and that the outcomes of t.test(x, y, paired = TRUE) and t.test(x - y, mu = 0) are equivalent.↩︎