3.5 Comparing Means from Different Populations
Suppose you are interested in the means of two different populations, denote them and . More specifically, you are interested whether these population means are different from each other and plan to use a hypothesis test to verify this on the basis of independent sample data from both populations. A suitable pair of hypotheses is
where denotes the hypothesized difference in means (so when the means are equal, under the null hypothesis). The book teaches us that can be tested with the -statistic
where
This is called a two sample -test. For large and , (3.7) is standard normal under the null hypothesis. Analogously to the simple -test we can compute confidence intervals for the true difference in population means:
is a confidence interval for .
In R, hypotheses as in (3.6) can be tested with t.test(), too. Note that t.test() chooses by default. This can be changed by setting the argument mu accordingly.
The subsequent code chunk demonstrates how to perform a two sample -test in R using simulated data.
# set random seed
set.seed(1)
# draw data from two different populations with equal mean
sample_pop1 <- rnorm(100, 10, 10)
sample_pop2 <- rnorm(100, 10, 20)
# perform a two sample t-test
t.test(sample_pop1, sample_pop2)
##
## Welch Two Sample t-test
##
## data: sample_pop1 and sample_pop2
## t = 0.872, df = 140.52, p-value = 0.3847
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.338012 6.028083
## sample estimates:
## mean of x mean of y
## 11.088874 9.243838
We find that the two sample -test does not reject the (true) null hypothesis that .