3.5 Comparing Means from Different Populations
Suppose you are interested in the means of two different populations, denote them \(\mu_1\) and \(\mu_2\). More specifically, you are interested whether these population means are different from each other and plan to use a hypothesis test to verify this on the basis of independent sample data from both populations. A suitable pair of hypotheses is
\[\begin{equation} H_0: \mu_1 - \mu_2 = d_0 \ \ \text{vs.} \ \ H_1: \mu_1 - \mu_2 \neq d_0 \tag{3.6} \end{equation}\]where \(d_0\) denotes the hypothesized difference in means (so \(d_0=0\) when the means are equal, under the null hypothesis). The book teaches us that \(H_0\) can be tested with the \(t\)-statistic
\[\begin{equation} t=\frac{(\overline{Y}_1 - \overline{Y}_2) - d_0}{SE(\overline{Y}_1 - \overline{Y}_2)} \tag{3.7} \end{equation}\]where
\[\begin{equation} SE(\overline{Y}_1 - \overline{Y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}. \end{equation}\]This is called a two sample \(t\)-test. For large \(n_1\) and \(n_2\), (3.7) is standard normal under the null hypothesis. Analogously to the simple \(t\)-test we can compute confidence intervals for the true difference in population means:
\[ (\overline{Y}_1 - \overline{Y}_2) \pm 1.96 \times SE(\overline{Y}_1 - \overline{Y}_2) \]
is a \(95\%\) confidence interval for \(d\).
In R, hypotheses as in (3.6) can be tested with t.test(), too. Note that t.test() chooses \(d_0 = 0\) by default. This can be changed by setting the argument mu accordingly.
The subsequent code chunk demonstrates how to perform a two sample \(t\)-test in R using simulated data.
# set random seed
set.seed(1)
# draw data from two different populations with equal mean
sample_pop1 <- rnorm(100, 10, 10)
sample_pop2 <- rnorm(100, 10, 20)
# perform a two sample t-test
t.test(sample_pop1, sample_pop2)
##
## Welch Two Sample t-test
##
## data: sample_pop1 and sample_pop2
## t = 0.872, df = 140.52, p-value = 0.3847
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.338012 6.028083
## sample estimates:
## mean of x mean of y
## 11.088874 9.243838
We find that the two sample \(t\)-test does not reject the (true) null hypothesis that \(d_0 = 0\).