27.3 Sampling distribution: Expectation

A RQ is answered using data (this is partly what is meant by evidence-based research). Fortunately, for the body-temperature study, data are available from a comprehensive American study (Shoemaker 1996).

Summarising the data is important, because the data are the means by which the RQ is answered (data below).

A graphical summary (Fig. 27.1) shows that the internal body temperature of individuals varies from person to person: this is natural variation. A numerical summary (from software) shows that:

  • The sample mean is ˉx=36.8051C;
  • The sample standard deviation is s=0.40732C;
  • The sample size is n=130.

The sample mean is less than the assumed value of μ=37C… The question is why: can the difference reasonably be explained by sampling variation, or not?

A 95% CI can also be computed (using software or manually): the 95% CI for μ is from 36.73 to 36.88C. This CI is narrow, implying that μ has been estimated with precision, so detecting even small deviations of μ from 37 should be possible.

The histogram of the body temperature data

FIGURE 27.1: The histogram of the body temperature data

The decision-making process assumes that the population mean temperature is μ=37.0C, as stated in the null hypothesis. Because of sampling variation, the value of ˉx sometimes would be smaller than 37.0C and sometimes greater than 37.0C.

How much variation in the value of ˉx could be expected, simply due to sampling variation, when μ=37.0C? This variation is described by the sampling distribution.

The sampling distribution of ˉx was discussed in Sect. 22.2 (and Def. 22.1 specifically). From this, if μ really was 37.0C and if certain conditions are true, the possible values of the sample means can be described using:

  • An approximate normal distribution;
  • With mean 37.0C (from H0);
  • With standard deviation of s.e.(ˉx)=sn=0.40732130=0.035724. This is the standard error of the sample means.

A picture of this sampling distribution (Fig. 27.2) shows how the sample mean varies when n=130, simply due to sampling variation, when μ=37C. This enables questions to be asked about the likely values of ˉx that would be found in the sample, when the population mean is μ=37C.

The distribution of sample mean body temperatures, if the population mean is $37^\circ$C and $n=130$.  The grey vertical lines are 1, 2 and 3 standard deviations from the mean.

FIGURE 27.2: The distribution of sample mean body temperatures, if the population mean is 37C and n=130. The grey vertical lines are 1, 2 and 3 standard deviations from the mean.

Think 27.1 (Values of ˉx) Given the sampling distribution shown in Fig. 27.2, use the 68–95–99.7 rule to determine how often will ˉx be larger than 37.036 degrees C just because of sampling variation, if μ really is 37C.

References

Shoemaker AL. What’s normal? – temperature, gender, and heart rate. Journal of Statistics Education [Internet]. 1996;4(2). Available from: http://jse.amstat.org/v4n2/datasets.shoemaker.html.