Chapter 1 The independent samples \(t\)-test

When we have two independent groups and we want to know whether they are significantly different from each other with regard to a certain characteristic, we can use the independent-samples \(t\)-test. For example, let's consider again our example from the last topic: the data set called heartattack from the R package called datarium (Kassambara 2019) contains cholesterol measurements of 72 patients. As well as cholesterol levels, the data set also contains a categorical variable called risk which indicates whether the patients were at low or high risk of heart attack.

Suppose we wanted to know if there was a significant difference in average cholesterol levels between patients in the 'low risk' and 'high risk' groups. We could propose this question in the form of the following hypotheses:

\[H_0:\mu_1 = \mu_2\;\;\text{versus}\;\;H_1:\mu_1 \neq \mu_2,\] where:

  • \(\mu_1\) denotes the population mean cholesterol level of patients in the high risk group
  • \(\mu_2\) denotes the population mean cholesterol level of patients in the low risk group.

Note: if \(\mu_1 = \mu_2\), this means that the difference between \(\mu_1\) and \(\mu_2\) is zero. So the above hypothesis could equivalently be written as: \(H_0:\mu_1 - \mu_2 = 0\;\;\text{versus}\;\;H_1:\mu_1 - \mu_2 \neq 0.\)

What does it mean to have two independent groups, as we need to have to carry out an independent-samples \(t\)-test? One way of thinking of it would be that individuals can only be in one group or the other: not both. Considering our example, a patient can only be categorised as 'high risk' OR 'low risk' - not both - meaning these two groups are independent, and appropriate for the independent-samples \(t\)-test.

What type of variables are required for the independent samples \(t\)-test?

An independent samples \(t\)-test will always involve two variables:

  1. The dependent variable, sometimes also called the response variable. This should be a numeric, continuous variable.
  2. The independent variable. This should be a categorical variable with only two categories.

Your turn

  1. In the cholesterol example, which variable is the dependent variable?
  2. In the cholesterol example, which variable is the independent variable?
  1. Cholesterol level
  2. Group ('low risk' / 'high risk')

References

Kassambara, Alboukadel. 2019. Datarium: Data Bank for Statistical Analysis and Visualization. https://CRAN.R-project.org/package=datarium.