Chapter 1 The independent samples \(t\)-test
When we have two independent groups and we want to know whether they are significantly different from each other with regard to a certain characteristic, we can use the independent-samples \(t\)-test. For example, let's consider again our example from the last topic: the data set called heartattack
from the R package called datarium
(Kassambara 2019) contains cholesterol measurements of 72 patients. As well as cholesterol levels, the data set also contains a categorical variable called risk
which indicates whether the patients were at low or high risk of heart attack.
Suppose we wanted to know if there was a significant difference in average cholesterol levels between patients in the 'low risk' and 'high risk' groups. We could propose this question in the form of the following hypotheses:
\[H_0:\mu_1 = \mu_2\;\;\text{versus}\;\;H_1:\mu_1 \neq \mu_2,\] where:
- \(\mu_1\) denotes the population mean cholesterol level of patients in the high risk group
- \(\mu_2\) denotes the population mean cholesterol level of patients in the low risk group.
Note: if \(\mu_1 = \mu_2\), this means that the difference between \(\mu_1\) and \(\mu_2\) is zero. So the above hypothesis could equivalently be written as: \(H_0:\mu_1 - \mu_2 = 0\;\;\text{versus}\;\;H_1:\mu_1 - \mu_2 \neq 0.\)
What does it mean to have two independent groups, as we need to have to carry out an independent-samples \(t\)-test? One way of thinking of it would be that individuals can only be in one group or the other: not both. Considering our example, a patient can only be categorised as 'high risk' OR 'low risk' - not both - meaning these two groups are independent, and appropriate for the independent-samples \(t\)-test.
What type of variables are required for the independent samples \(t\)-test?
An independent samples \(t\)-test will always involve two variables:
- The dependent variable, sometimes also called the response variable. This should be a numeric, continuous variable.
- The independent variable. This should be a categorical variable with only two categories.
Your turn
- In the cholesterol example, which variable is the dependent variable?
- In the cholesterol example, which variable is the independent variable?
- Cholesterol level
- Group ('low risk' / 'high risk')