Introduction to Econometrics with R

This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

3.5 Comparing Means from Different Populations

Suppose you are interested in the means of two different populations, denote them $\mu_1$ and $\mu_2$ . More specifically, you are interested whether these population means are different from each other and plan to use a hypothesis test to verify this on the basis of independent sample data from both populations. A suitable pair of hypotheses is

$\begin{equation} H_0: \mu_1 - \mu_2 = d_0 \ \ \text{vs.} \ \ H_1: \mu_1 - \mu_2 \neq d_0 \tag{3.6} \end{equation}$

where $d_0$ denotes the hypothesized difference in means (so $d_0=0$ when the means are equal, under the null hypothesis). The book teaches us that $H_0$ can be tested with the $t$ -statistic

$\begin{equation} t=\frac{(\overline{Y}_1 - \overline{Y}_2) - d_0}{SE(\overline{Y}_1 - \overline{Y}_2)} \tag{3.7} \end{equation}$

where

$\begin{equation} SE(\overline{Y}_1 - \overline{Y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}. \end{equation}$

This is called a two sample $t$ -test. For large $n_1$ and $n_2$ , (3.7) is standard normal under the null hypothesis. Analogously to the simple $t$ -test we can compute confidence intervals for the true difference in population means:

$(\overline{Y}_1 - \overline{Y}_2) \pm 1.96 \times SE(\overline{Y}_1 - \overline{Y}_2)$

is a $95\%$ confidence interval for $d$ .
In R, hypotheses as in (3.6) can be tested with t.test(), too. Note that t.test() chooses $d_0 = 0$ by default. This can be changed by setting the argument mu accordingly.

The subsequent code chunk demonstrates how to perform a two sample $t$ -test in R using simulated data.

# set random seed
set.seed(1)

# draw data from two different populations with equal mean
sample_pop1 <- rnorm(100, 10, 10)
sample_pop2 <- rnorm(100, 10, 20)

# perform a two sample t-test
t.test(sample_pop1, sample_pop2)

## 
##  Welch Two Sample t-test
## 
## data:  sample_pop1 and sample_pop2
## t = 0.872, df = 140.52, p-value = 0.3847
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.338012  6.028083
## sample estimates:
## mean of x mean of y 
## 11.088874  9.243838

We find that the two sample $t$ -test does not reject the (true) null hypothesis that $d_0 = 0$ .