Topic 7 Independent samples t-test
Before, with one-sample t-test, we wanted to test if our sample mean was equal to our guess for the population mean.
Now, we are interested in testing if the population mean of a group is equal to the population mean of another group.
This is called the independent samples t-test.
We use this test to examine the means of one continuous variable (we call it the dependent variable) across two different groups defined by a categorical variable (we call it the independent variable).
7.1 Formula
Note that in the formulas below the subscripts 1 and 2 refer, respectively, to groups 1 and 2 defined by the categorical variable you are interested in.
Your t-calculated is now defined by:
\[t = \frac{\bar {x}_1 - \bar{x}_2}{se_{diff}}\]
where
\[se_{diff} = \sqrt{\frac{{s}^{2}_p}{n_1}+\frac{{s}^{2}_p}{n_2}}\]
and \({s}^{2}_p\), which is the pooled variance, is:
\[{s}^{2}_p = \frac{SS_1 + SS_2}{df_1 +df_2} \]
Note that \(SS\) is the sum of squares for each group and can be calculated by:
\[SS_1 = \sum f \cdot {(x_1 - \bar {x}_1)}^2\] \[SS_2 = \sum f \cdot {(x_2 - \bar {x}_2)}^2\]
You should get those values easily from the table you write to calculate the mean and standard deviation. See below:
\(x\) | f | fx | \(x - \overline{x}\) | \((x - \overline{x})^2\) | \(f(x - \overline{x})^2\) |
---|---|---|---|---|---|
… | … | … | … | … | … |
. | \(\sum f\) | \(\sum fx\) | \(\sum above = SS\) |
Therefore, to hand-calculate the independent samples t-test, you need to draw two of the above tables (one for each group). After that, you simply apply the formulas above.
7.2 Assumptions
For the Independent sample t-test to work, three assumptions need to hold:
- Normality: The variable of interest should be normally distributed within each group.
- We usually do not perform mathematical tests to check for normality.
- Looking at measures of central tendency and histograms is probably the best way to do so.
- Independence: There must be no relationship between the variables in each group.
- This is not simple to test.
- Usually, we assume it holds if each group is composed of observations from different units. In other words, there must be no individual that belongs to both groups.
- Equality of variances: The population variances of the two groups must not be statistically different.
- If variances are equal, we say we have homoskedasticity
- If variances are not equal, we say we have heteroskedasticity
- We can test it!
7.3 Levene’s test for equality of variances
We test the following:
- Null hypothesis: population variances are equal
- Alternative hypothesis: population variances are not equal
This is an application of the F-test (we will cover it in the following weeks). Do not worry about the math behind it for now.
Notes:
- When you perform your independent sample t-test, SPSS performs the Levene’s test automatically!
- Homogeneity of variance test = Levene’s test
- Factor variable is the variable that defines your groups (our independent variable).
Interpretation
As before, we will look at the p-value:
- \(p \leq \alpha\) we reject the null
- \(p > \alpha\) we fail to reject the null
Conclusion
If we reject the null, then variances are not equal. Therefore, our assumption (3) for the Independent sample t-test does not hold.
7.4 Interpretation of the independent sample t-test
Once again, we look at the p-value:
- \(p \leq \alpha\) we reject the null
- \(p > \alpha\) we fail to reject the null
Conclusion
If we reject the null, we conclude that the means of the different groups are not equal.
7.5 Exercise
First, I will illustrate looking at differences in wages between married and non-married individuals.
Second, using the “earnings_data.sav” do the necessary procedures to check if college graduates earn higher wages than non college graduates.
- What is your null hypothesis?
- What is the alternative hypothesis?
- What is your alpha?
- Interpret your p-value.