# Chapter 1 The independent samples \(t\)-test

When we have ** two independent groups** and we want to know whether they are significantly different from each other with regard to a certain characteristic, we can use the independent-samples \(t\)-test. For example, let's consider again our example from the last topic: the data set called

`heartattack`

from the R package called `datarium`

(Kassambara 2019) contains cholesterol measurements of 72 patients. As well as cholesterol levels, the data set also contains a categorical variable called `risk`

which indicates whether the patients were at low or high risk of heart attack.Suppose we wanted to know if there was a significant difference in average cholesterol levels between patients in the 'low risk' and 'high risk' groups. We could propose this question in the form of the following hypotheses:

\[H_0:\mu_1 = \mu_2\;\;\text{versus}\;\;H_1:\mu_1 \neq \mu_2,\] where:

- \(\mu_1\) denotes the true average cholesterol level of patients in the high risk group
- \(\mu_2\) denotes the true average cholesterol level of patients in the low risk group.

Note: if \(\mu_1 = \mu_2\), this means that the difference between \(\mu_1\) and \(\mu_2\) is zero. So the above hypothesis could equivialently be written as: \(H_0:\mu_1 - \mu_2 = 0\;\;\text{versus}\;\;H_1:\mu_1 - \mu_2 \neq 0.\)

What does it mean to have ** two independent groups**, as we need to have to carry out an independent-samples \(t\)-test? One way of thinking of it would be that individuals can only be in one group or the other: not both. Considering our example, a patient can only be categorised as 'high risk' OR 'low risk' - not both - meaning these two groups are

**, and appropriate for the independent-samples \(t\)-test.**

*independent***What type of variables are required for the independent samples \(t\)-test**?

An independent samples \(t\)-test will always involve two variables:

- The
variable, sometimes also called the*dependent**response*variable. This should be a numeric, continuous variable. - The
variable. This should be a categorical variable with only*independent*.*two categories*

**Your turn**

- In the cholesterol example, which variable is the
variable?*dependent* - In the cholesterol example, which variable is the
variable?*independent*

- Cholesterol level
- Group ('low risk' / 'high risk')

### References

*Datarium: Data Bank for Statistical Analysis and Visualization*. https://CRAN.R-project.org/package=datarium.