23.7 Confidence intervals: Mean differences
The CI for the mean difference has the same form as for a single mean (Chap. 22), so an approximate 95% confidence interval (CI) for \(\mu_d\) is
\[ \bar{d} \pm 2 \times\text{s.e.}(\bar{d}). \] This is the same as the CI for \(\bar{x}\) if the differences are considered as the data.
For the insulation data:
\[
0.54 \pm (2 \times 0.3211784),
\]
or \(0.54\pm 0.642\).
This CI is equivalent to
\(0.54 - 0.642 = -0.102\),
up to
\(0.54 + 0.642 = 1.182\).
We write:
Based on the sample, an approximate 95% CI for the population mean energy saving after adding the wall cavity insulation is from \(-0.10\) to \(1.18\)MWh.
The negative number is not an energy consumption value; it is a negative mean amount of energy saved. Saving a negative amount is like using more energy. So the 95% CI is saying that we are reasonably confident that, after adding the insulation, the mean energy-use difference is between using \(0.10\)MWh more energy to using \(1.18\)MWh less energy. Alternatively, the plausible values for the mean energy savings are between \(-0.10\) to \(1.18\)MWh.
Example 23.1 (COVID lockdown) A study of \(n = 213\) Spanish health students (Romero-Blanco et al. 2020) measured (among other things) the number of minutes of vigorous physical activity (PA) performed by students before and during the COVID-19 lockdown (from March to April 2020 in Spain).
Since the before and during lockdown were both measured on each participant, the data are paired. The data are summarised below.
Mean (minutes) | Standard deviation (minutes) | |
---|---|---|
Before | 28.47 | 54.13 |
During | 30.66 | 30.04 |
Difference | -2.68 | 51.30 |
Notice that the differences are defined as Before minus During. A positive difference therefore means the Before value is higher; hence, the differences tell us how much longer the student spent doing vigorous PA before the COVID lockdown. Similarly, a negative value means that the During value is higher.
In this situation, the parameter of interest is the population mean difference \(\mu_d\), the mean amount that students spent in vigorous PA before the lockown compared to during the lockdown.
Also notice that the standard deviation of the difference (\(\text{s.e.}(\bar{d}) = 51.30\)) is not \(54.13 - 30.04\), or \(30.04 - 54.13\). Those calculations would find the difference between the two standard deviations… not the standard deviation of the list of differences.
Every sample would contain different students, and hence would produce different pre- and during-COVID mean amounts of PA, so those means would have standard error.
Likewise, the mean of each individuals’ difference would vary from sample to sample, so the mean difference would vary and hence have a standard error:
\[ \text{s.e.}(\bar{d}) = \frac{s_d}{n} = \frac{51.30}{\sqrt{213}} = 3.515018. \] The approximate 95% CI for the population mean difference is from
\[ -2.68 - (2 \times 3.515018) = -9.710036 \] to \[ -2.68 + (2 \times 3.789094) = 4.350036, \] so the approximate 95% CI for the population mean difference is from -9.71 to 4.35 minutes.
Notice that one of the values is negative. This does not mean a negative amount of PA (which would make no sense); the CI is for the population mean difference. So, a negative value means that the During values are higher than the Before values on average.
So, the CI means:
In the population, the mean difference between the amount of vigorous PA by Spanish health students is between 9.71 minutes more during lockdown, and 4.35 minutes more before lockdown.