6.3 Tests on two normal populations
We assume now two populations represented as two independent rv’s X1∼N(μ1,σ21) and X2∼N(μ2,σ22), with unknown means and variances. From two srs’s (X11,…,X1n1) and (X21,…,X2n2) of X1 and X2, we will test hypotheses about the difference of means μ1−μ2, assuming σ21=σ22, and about the ratio of variances σ21/σ22. As in Section 6.2, the sampling distributions obtained in Section 2.2 for normal populations will be key for obtaining the critical regions of the forthcoming tests.
6.3.1 Equality of means
We assume that σ21=σ22=σ2. The hypotheses to test are of three types:
- H0:μ1=μ2 vs. H1:μ1>μ2;
- H0:μ1=μ2 vs. H1:μ1<μ2;
- H0:μ1=μ2 vs. H1:μ1≠μ2.
Denoting θ:=μ1−μ2, then the hypotheses can be rewritten as:
- H0:θ=0 vs. H1:θ>0;
- H0:θ=0 vs. H1:θ<0;
- H0:θ=0 vs. H1:θ≠0.
An estimator of θ is the difference of sample means,
ˆθ=ˉX1−ˉX2∼N(μ1−μ2,σ2(1n1+1n2)).
If we estimate σ2 using
S2=(n1−1)S′21+(n2−1)S′22n1+n2−2,
then an adequate test statistic is
T=ˉX1−ˉX2S√1n1+1n2H0∼tn1+n2−2.
It does not take much to realize that the critical regions can be completely recycled from that in Section 6.2.1. Therefore, the critical regions are:
- Ca={T>tn1+n2−2;α};
- Cb={T<−tn1+n2−2;α};
- Cc={|T|>tn1+n2−2;α/2}.
Example 6.8 Is there any evidence that any of the two training methods described in Example 5.5 works better with α=0.05? The average assembly times for the two groups of nine employees were ˉX1≈35.22 and ˉX2≈31.56, and the quasivariances S′21≈24.445 and S′22≈20.027.
We want to test
H0:μ1=μ2vs.H1:μ1≠μ2.
The observed value of the test statistic follows from the pooled estimation of the variance,
S2≈(9−1)×24.445+(9−1)×20.0279+9−2≈22.24,
which provides
T≈35.22−31.564.71√19+19≈1.65.
Then, the critical region is C={|T|>t16;0.025≈2.12}. Since T≈1.65<2.12, that is, the statistic does not belong to either of the two parts of the critical region. It is concluded that the data does not provide evidence supporting that any of the two methods works better.
The R function t.test()
implements the (two-sample) test of H0:μ1=μ2 against different alternatives. The main arguments of the function are as follows:
t.test(x, y, alternative = c("two.sided", "less", "greater"),
var.equal = FALSE, paired = FALSE, ...)
The flag var.equal
indicates if σ21=σ22. The table below shows the encoding of the alternative
argument:
alternative |
"two.sided" |
"less" |
"greater" |
---|---|---|---|
H1 | μ1≠μ2 | μ1<μ2 | μ1>μ2 |
Remark. The paired
argument serves to indicate if the srs’s (X11,…,X1n1) and (X21,…,X2n2) are paired. That is, if n1=n2 and both samples are actually dependent between them because they correspond to measurements in the same individuals:
X1 | X2 | Y:=X1−X2 |
---|---|---|
X11 | X21 | Y1:=X11−X21 |
⋮ | ⋮ | ⋮ |
X1n | X2n | Yn:=X1n−X2n |
In this case, paired = TRUE
is the same as testing H0:μY=0 with the srs (Y1,…,Yn) (i.e., we are under the setting of Section 6.2.1).78 The prototypical example of a paired test is the measurement of a certain characteristic (e.g., blood pressure) of a group of patients before and after a drug is administrated.
Example 6.9 The t.test()
solution to Example 6.8 is very simple:
# Apply t.test() with equal variances and H1: mu1 != mu2
std <- c(32, 37, 35, 28, 41, 44, 35, 31, 34)
new <- c(35, 31, 29, 25, 34, 40, 27, 32, 31)
t.test(x = std, y = new, alternative = "two.sided", var.equal = TRUE,
paired = FALSE)
##
## Two Sample t-test
##
## data: std and new
## t = 1.6495, df = 16, p-value = 0.1185
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.045706 8.379039
## sample estimates:
## mean of x mean of y
## 35.22222 31.55556
That the reported p-value is larger than α=0.05 indicates non-rejection of H0, as seen in Section 6.5.
6.3.2 Equality of variances
We want to test the following hypotheses:
- H0:σ21=σ22 vs. H1:σ21>σ22;
- H0:σ21=σ22 vs. H1:σ21<σ22;
- H0:σ21=σ22 vs. H1:σ21≠σ22.
Denoting θ:=σ21/σ22, then the hypotheses can be rewritten as:
- H0:θ=1 vs. H1:θ>1;
- H0:θ=1 vs. H1:θ<1;
- H0:θ=1 vs. H1:θ≠1.
An estimator of θ is ˆθ=S′21/S′22, but its distribution is unknown as it will depend on σ21 and σ22. However, we do know the distribution of
F=(n1−1)S′21σ21/(n1−1)(n2−1)S′22σ22/(n2−1)=S′21/σ21S′22/σ22∼Fn1−1,n2−1.
Besides, under H0:σ21=σ22,
F=S′21S′22H0∼Fn1−1,n2−1,
so F is a test statistic. The rejection regions are given by:
- Ca={(x1,…,xn)′∈Rn:F(x1,…,xn)>Fn1−1,n2−1;α};
- Cb={(x1,…,xn)′∈Rn:F(x1,…,xn)<Fn1−1,n2−1;1−α};
- Cc={(x1,…,xn)′∈Rn:F(x1,…,xn)>Fn1−1,n2−1;α/2 or F(x1,…,xn)<Fn1−1,n2−1;1−α/2}.
Example 6.10 An experiment for studying the pain threshold consists in applying small electric shocks to 14 men and 12 women and recording their pain thresholds. The experiment provides the following data:
- Men: 16, 13.4, 17.7, 10.2, 13.1, 15.4, 15.9, 11.9, 13.9, 15.5, 15.9, 12.5, 16.5, 16.5.
- Women: 5.8, 6.4, 13.1, 7.2, 12.8, 9.8, 10.5, 18.9, 13.7, 13.7, 9.8, 11.5.
Assuming that the variable that measures the threshold pain for men and women is normally distributed, is there evidence of a different variability in the threshold pain between men and women at significance level α=0.05?
We want to test
H0:σ2M=σ2Wvs.H1:σ2M≠σ2W.
The test statistic is
F=S′2MS′2W≈4.527713.6855≈0.3308.
The critical region is
C={F>F13,11;0.025 or F<F13,11;0.975}.
F13,11;0.025 and F13,11;0.975 are computed in R as follows:
qf(0.025, df1 = 13, df2 = 11, lower.tail = FALSE)
## [1] 3.391728
qf(0.975, df1 = 13, df2 = 11, lower.tail = FALSE)
## [1] 0.3127447
Since F=0.3308 does not belong to the critical region, we conclude that the experiment does not provide enough evidence against the threshold pain being equally variable for both genders.
The R function var.test()
implements the (two-sample) test of H0:σ21=σ22 against different alternatives. The main arguments of the function are as follows:
The table below shows the encoding of the alternative
argument:
alternative |
"two.sided" |
"less" |
"greater" |
---|---|---|---|
H1 | σ21≠σ22 | σ21<σ22 | σ21>σ22 |
Example 6.11 The var.test()
solution to Example 6.10 is very simple:
# Apply var.test() with H1: sigma_1^2 != sigma_2^2
men <- c(16.0, 13.4, 17.7, 10.2, 13.1, 15.4, 15.9, 11.9, 13.9, 15.5, 15.9,
12.5, 16.5, 16.5)
wom <- c(5.8, 6.4, 13.1, 7.2, 12.8, 9.8, 10.5, 18.9, 13.7, 13.7, 9.8, 11.5)
var.test(x = men, y = wom, alternative = "two.sided")
##
## F test to compare two variances
##
## data: men and wom
## F = 0.33084, num df = 13, denom df = 11, p-value = 0.06162
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.09754312 1.05785883
## sample estimates:
## ratio of variances
## 0.3308397
That the reported p-value is larger than α=0.05 indicates non-rejection of H0, as seen in Section 6.5.
A good exercise is to check that this statement is true using and that the outcomes of
t.test(x, y, paired = TRUE)
andt.test(x - y, mu = 0)
are equivalent.↩︎