Chapter 1 The independent samples t-test
When we have two independent groups and we want to know whether they are significantly different from each other with regard to a certain characteristic, we can use the independent-samples t-test. For example, let's consider again our example from the last topic: the data set called heartattack
from the R package called datarium
(Kassambara 2019) contains cholesterol measurements of 72 patients. As well as cholesterol levels, the data set also contains a categorical variable called risk
which indicates whether the patients were at low or high risk of heart attack.
Suppose we wanted to know if there was a significant difference in average cholesterol levels between patients in the 'low risk' and 'high risk' groups. We could propose this question in the form of the following hypotheses:
H0:μ1=μ2versusH1:μ1≠μ2, where:
- μ1 denotes the population mean cholesterol level of patients in the high risk group
- μ2 denotes the population mean cholesterol level of patients in the low risk group.
Note: if μ1=μ2, this means that the difference between μ1 and μ2 is zero. So the above hypothesis could equivalently be written as: H0:μ1−μ2=0versusH1:μ1−μ2≠0.
What does it mean to have two independent groups, as we need to have to carry out an independent-samples t-test? One way of thinking of it would be that individuals can only be in one group or the other: not both. Considering our example, a patient can only be categorised as 'high risk' OR 'low risk' - not both - meaning these two groups are independent, and appropriate for the independent-samples t-test.
What type of variables are required for the independent samples t-test?
An independent samples t-test will always involve two variables:
- The dependent variable, sometimes also called the response variable. This should be a numeric, continuous variable.
- The independent variable. This should be a categorical variable with only two categories.
Your turn
- In the cholesterol example, which variable is the dependent variable?
- In the cholesterol example, which variable is the independent variable?
- Cholesterol level
- Group ('low risk' / 'high risk')