# Chapter 1 The independent samples $$t$$-test

When we have two independent groups and we want to know whether they are significantly different from each other with regard to a certain characteristic, we can use the independent-samples $$t$$-test. For example, let's consider again our example from the last topic: the data set called heartattack from the R package called datarium contains cholesterol measurements of 72 patients. As well as cholesterol levels, the data set also contains a categorical variable called risk which indicates whether the patients were at low or high risk of heart attack.

Suppose we wanted to know if there was a significant difference in average cholesterol levels between patients in the 'low risk' and 'high risk' groups. We could propose this question in the form of the following hypotheses:

$H_0:\mu_1 = \mu_2\;\;\text{versus}\;\;H_1:\mu_1 \neq \mu_2,$ where:

• $$\mu_1$$ denotes the population mean cholesterol level of patients in the high risk group
• $$\mu_2$$ denotes the population mean cholesterol level of patients in the low risk group.

Note: if $$\mu_1 = \mu_2$$, this means that the difference between $$\mu_1$$ and $$\mu_2$$ is zero. So the above hypothesis could equivalently be written as: $$H_0:\mu_1 - \mu_2 = 0\;\;\text{versus}\;\;H_1:\mu_1 - \mu_2 \neq 0.$$

What does it mean to have two independent groups, as we need to have to carry out an independent-samples $$t$$-test? One way of thinking of it would be that individuals can only be in one group or the other: not both. Considering our example, a patient can only be categorised as 'high risk' OR 'low risk' - not both - meaning these two groups are independent, and appropriate for the independent-samples $$t$$-test.

What type of variables are required for the independent samples $$t$$-test?

An independent samples $$t$$-test will always involve two variables:

1. The dependent variable, sometimes also called the response variable. This should be a numeric, continuous variable.
2. The independent variable. This should be a categorical variable with only two categories.