Topic 7 Independent samples t-test

Before, with one-sample t-test, we wanted to test if our sample mean was equal to our guess for the population mean.

Now, we are interested in testing if the population mean of a group is equal to the population mean of another group.

This is called the independent samples t-test.

We use this test to examine the means of one continuous variable (we call it the dependent variable) across two different groups defined by a categorical variable (we call it the independent variable).

7.1 Formula

Note that in the formulas below the subscripts 1 and 2 refer, respectively, to groups 1 and 2 defined by the categorical variable you are interested in.

Your t-calculated is now defined by:

\[t = \frac{\bar {x}_1 - \bar{x}_2}{se_{diff}}\]

where

\[se_{diff} = \sqrt{\frac{{s}^{2}_p}{n_1}+\frac{{s}^{2}_p}{n_2}}\]

and \({s}^{2}_p\), which is the pooled variance, is:

\[{s}^{2}_p = \frac{SS_1 + SS_2}{df_1 +df_2} \]

Note that \(SS\) is the sum of squares for each group and can be calculated by:

\[SS_1 = \sum f \cdot {(x_1 - \bar {x}_1)}^2\] \[SS_2 = \sum f \cdot {(x_2 - \bar {x}_2)}^2\]

You should get those values easily from the table you write to calculate the mean and standard deviation. See below:

\(x\) f fx \(x - \overline{x}\) \((x - \overline{x})^2\) \(f(x - \overline{x})^2\)
. \(\sum f\) \(\sum fx\) \(\sum above = SS\)

Therefore, to hand-calculate the independent samples t-test, you need to draw two of the above tables (one for each group). After that, you simply apply the formulas above.

7.2 Assumptions

For the Independent sample t-test to work, three assumptions need to hold:

  1. Normality: The variable of interest should be normally distributed within each group.
  • We usually do not perform mathematical tests to check for normality.
  • Looking at measures of central tendency and histograms is probably the best way to do so.
  1. Independence: There must be no relationship between the variables in each group.
  • This is not simple to test.
  • Usually, we assume it holds if each group is composed of observations from different units. In other words, there must be no individual that belongs to both groups.
  1. Equality of variances: The population variances of the two groups must not be statistically different.
  • If variances are equal, we say we have homoskedasticity
  • If variances are not equal, we say we have heteroskedasticity
  • We can test it!

7.3 Levene’s test for equality of variances

We test the following:

  • Null hypothesis: population variances are equal
  • Alternative hypothesis: population variances are not equal

This is an application of the F-test (we will cover it in the following weeks). Do not worry about the math behind it for now.

Notes:

  • When you perform your independent sample t-test, SPSS performs the Levene’s test automatically!
  • Homogeneity of variance test = Levene’s test
  • Factor variable is the variable that defines your groups (our independent variable).

Interpretation

As before, we will look at the p-value:

  • \(p \leq \alpha\) we reject the null
  • \(p > \alpha\) we fail to reject the null

Conclusion

If we reject the null, then variances are not equal. Therefore, our assumption (3) for the Independent sample t-test does not hold.

7.4 Interpretation of the independent sample t-test

Once again, we look at the p-value:

  • \(p \leq \alpha\) we reject the null
  • \(p > \alpha\) we fail to reject the null

Conclusion

If we reject the null, we conclude that the means of the different groups are not equal.

7.5 Exercise

First, I will illustrate looking at differences in wages between married and non-married individuals.

Second, using the “earnings_data.sav” do the necessary procedures to check if college graduates earn higher wages than non college graduates.

  1. What is your null hypothesis?
  2. What is the alternative hypothesis?
  3. What is your alpha?
  4. Interpret your p-value.