## 23.7 Confidence intervals: Mean differences

The CI for the mean difference has the same form as for a single mean (Chap. 22), so an approximate 95% confidence interval (CI) for \(\mu_d\) is

\[ \bar{d} \pm 2 \times\text{s.e.}(\bar{d}). \] This is the same as the CI for \(\bar{x}\) if the differences are considered as the data.

For the insulation data:

\[
0.54 \pm (2 \times 0.3211784),
\]
or \(0.54\pm 0.642\).
This CI is equivalent to
\(0.54 - 0.642 = -0.102\),
up to
\(0.54 + 0.642 = 1.182\).
We write:

Based on the sample, an

approximate95% CI for the population mean energysavingafter adding the wall cavity insulation is from \(-0.10\) to \(1.18\)MWh.

The negative number is *not* an energy consumption value;
it is a negative mean amount of energy *saved*.
Saving a *negative* amount is like using *more* energy.
So the 95% CI is saying that
we are reasonably confident
that, after adding the insulation,
the mean energy-use difference is between using
\(0.10\)MWh *more* energy to using \(1.18\)MWh *less* energy.
Alternatively,
the plausible values for the mean energy savings are between \(-0.10\) to \(1.18\)MWh.

**Example 23.1 (COVID lockdown) **A study of \(n = 213\) Spanish health students (Romero-Blanco et al. 2020)
measured (among other things)
the number of minutes of vigorous physical activity (PA) performed by students
*before* and *during* the COVID-19 lockdown (from March to April 2020 in Spain).

Since the *before* and *during* lockdown were both measured on *each* participant,
the data are *paired*.
The data are summarised below.

Mean (minutes) | Standard deviation (minutes) | |
---|---|---|

Before | 28.47 | 54.13 |

During | 30.66 | 30.04 |

Difference |
-2.68 | 51.30 |

Notice that the *differences* are defined as *Before* minus *During*.
A *positive* difference therefore means the *Before* value is higher;
hence,
the differences tell us how much longer the student spent doing vigorous PA *before* the COVID lockdown.
Similarly,
a *negative * value means that the *During* value is higher.

In this situation, the *parameter* of interest is the population mean difference \(\mu_d\),
the mean amount that students spent in vigorous PA *before* the lockown compared to *during* the lockdown.

Also notice that the standard deviation of the difference (\(\text{s.e.}(\bar{d}) = 51.30\)) is **not**
\(54.13 - 30.04\), or \(30.04 - 54.13\).
Those calculations would find the difference between the two standard deviations…
not the standard deviation of the list of differences.

Every sample would contain different students, and hence would produce different pre- and during-COVID mean amounts of PA, so those means would have standard error.

Likewise, the mean of each individuals’ *difference* would vary from sample to sample,
so the *mean difference* would vary and hence have a standard error:

\[
\text{s.e.}(\bar{d}) = \frac{s_d}{n} = \frac{51.30}{\sqrt{213}} = 3.515018.
\]
The approximate 95% CI for the population mean *difference* is from

\[
-2.68 - (2 \times 3.515018) = -9.710036
\]
to
\[
-2.68 + (2 \times 3.789094) = 4.350036,
\]
so the approximate 95% CI for the population mean *difference* is
from -9.71 to 4.35 minutes.

Notice that one of the values is *negative*.
This does **not** mean a negative amount of PA (which would make no sense);
the CI is for the population mean *difference*.
So, a negative value means that the *During* values are higher than the *Before* values on average.

So, the CI means:

In the population, the mean difference between the amount of vigorous PA by Spanish health students is between 9.71 minutes more

duringlockdown, and 4.35 minutes morebeforelockdown.