## 15.1 Introduction

In Sect. 14.6,
the NHANES data
(Center for Disease Control and Prevention (CDC) 1988--1994)
were numerically summarised.
The *sample mean* direct HDL cholesterol concentration
was different for
smokers (\(\bar{x} = 1.31\)mmol/L) and for
non-smokers (\(\bar{x} = 1.39\)mmol/L).

What does this difference between the **sample** means
imply about the **population** means?

Two reasons could explain why the sample means are different:

The

*population*means are the*same*. The*sample*means are different because every sample is likely to be different (each possible sample includes different people), so, sometimes the sample means are different by chance. This is called**sampling variation**.Alternatively, the

*population*means are*different*, and the sample means simply reflect this.

Similarly,
in Sect. 14.6
the *odds* of being diabetic were different for
smokers (0.181) and
non-smokers (0.084).
What does this difference between the **sample** odds imply about the **population** odds?

Again, two possible reasons could explain why the sample odds are different:

The

*population*odds are the same. The*sample*odds are different because every sample is likely to be different (each possible sample includes different people), so sometimes, the sample odds are different by chance. This is called ‘sampling variation.’Alternatively, the odds are different in the

*population*, and the sample odds simply reflect this.

In both situations (means; odds), the two possible explanations (‘hypotheses’) have special names:

- There is
*no difference*between the population parameters: this is the*null hypothesis*, or \(H_0\). - There is
*a difference*between the population parameters; this is the*alternative hypothesis*, or \(H_1\).

(The word hypothesis just means ‘a possible explanation.’)
A decision needs to be made about which of these two explanation is the most likely.
However,
because a sample is studied,
conclusions about the *population* are never certain.