23 CIs for mean differences (paired data)

So far, you have learnt to ask a RQ, identify different ways of obtaining data, design the study, collect the data describe the data, summarise data graphically and numerically, and understand the tools of inference.

In this chapter, you will learn about confidence intervals for mean differences (i.e., for paired data). You will learn to:

  • produce a confidence interval for a mean difference.
  • determine whether the conditions for using the confidence interval apply in a given situation.
  • compute sample size estimates in these this situation.

23.1 Mean differences

House insulation is important for saving energy, particularly in cold climates.

Consider a study to estimate the average energy savings made by using a new type of house insulation. Different study designs could be used to address this.

One approach is to take a sample of homes, and measure the energy consumption before adding the insulation, and then after adding the insulation for the same houses. Each home gets two observations: the energy consumption before and after adding the insulation. This would be comparing within individuals.

This is a descriptive RQ: the Outcome is the mean energy saving, and the response variable is the energy saving for each house. There is no Comparison between individuals: units of analysis that have been treated differently are not compared.

Alternatively, the researchers could take a sample of homes without the insulation, and measure their energy consumption; then take a different sample of homes with the insulation, and measure their energy consumption. This would be comparing between individuals.

This is a relational RQ: the Outcome is the mean energy consumption, and the response variable is the energy consumption for each house. The Comparison is between units of analysis with the insulation, and units of analysis without the insulation.

Either study is possible, and each has advantages and disadvantages.396 Here the first (Descriptive) design would seem superior (why?). In the first design, each home gets a pair of energy consumption measurements: this is paired data, which is the subject of this chapter. The second (Relational) design requires the means of two different groups of homes to be compared, which is the topic of the next chapter.

Definition 23.1 (Paired data) Data are paired when two observations about the same variable are recorded for each unit of analysis.

Paired data come from within individual comparisons.

Since each unit of analysis has two observations about energy consumption, the change (or the difference, or the reduction) in energy consumption can be computed for each house.

Then, questions can be asked about the population mean difference, which is not the same as difference between two separate population means (the subject of the next chapter). In paired data, finding the difference between the two measurements for each individual unit of analysis makes sense: each unit of analysis (each house) has two related observations.

Which of these are paired situations?

  1. The mean difference between blood pressure for 36 people, before and after taking a drug.
  2. The difference between the mean HDL cholesterol levels for 22 males and 19 females.
  3. The mean protein levels were compared in sea turtles before and after being rehabilitated.397

Situations 1 and 3 are paired situations.

23.2 Mean differences: An example

The Electricity Council in Bristol wanted to determine if a certain type of wall-cavity insulation reduced energy consumption in winter.398 Their (Descriptive) RQ was:

What is the mean reduction in energy consumption after adding home insulation?

The parameter is \(\mu_d\), the population mean reduction in energy consumption.

For the collected data (shown below) the same variable (energy consumption) is measured twice for each unit of analysis (the house): energy consumption before adding insulation and after adding insulation.

Finding the difference in energy consumption for each house seems sensible, as the data are paired. Once the differences are computed, the process for computing a CI is the same as in Chap. 22, where these changes (or differences) are used as the data.

Be clear about how the differences are computed. Differences could be computed as Before minus After (the energy consumption saving), or After minus Before (the energy consumption increase).

Either is fine, as long as you are consistent throughout. The meaning of any conclusions will be the same.

Here, discussing energy savings seems most natural, so we compute the differences as energy savings: Before minus After.

One energy saving value is negative. This does not mean negative energy usage: the values are differences (more specifically, energy reductions or savings).

The differences are computed as Before minus After, so a negative value means that the After value is greater than the Before value: an increase in energy consumption.

As always, begin by understanding the data: producing appropriate graphical and numerical summaries.

For this situation, what graphs would be suitable for displaying these data?

  • Boxplot

  • A histogram

  • A histogram of the differences (such as the energy savings) for each house

  • A case-profile plot

23.3 Notation: Mean differences

The notation used for paired data reflects how we work with the differences (Table 23.1). Apart from that, the notation is similar to that used in Chap. 22.

TABLE 23.1: The notation used for mean differences (paired data) compared to the notation used for one sample mean
One sample mean Mean of paired data
The observations: Values: \(x\) Differences: \(d\)
Sample mean: \(\bar{x}\) \(\bar{d}\)
Standard deviation: \(s\) \(s_d\)
Standard error of sample mean: \(\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}\) \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}}\)
Sample size: Number of observations: \(n\) Number of differences: \(n\)

23.4 Graphical summaries: Mean differences

Since the data are the differences (quantitative), the appropriate graph is a histogram (or a dot plot, or a stem-and-leaf plot) of the differences (Fig. 23.1).

Graphing the Before and After data may also be useful too, but a graph of the differences is crucial, as the RQ is about the differences.

A case-profile plot (Sect. 12.7.2) is also useful, but is sometimes harder to produce in software, and difficult to read when the sample size is large (the graph contains a line for each unit of analysis).

A plot of the energy savings from the insulation data. Left panel: A histogram (the vertical grey line represents no energy saving). Right panel: Case-profile plot (a dashed line represents an energy increase)

FIGURE 23.1: A plot of the energy savings from the insulation data. Left panel: A histogram (the vertical grey line represents no energy saving). Right panel: Case-profile plot (a dashed line represents an energy increase)

23.5 Numerical summaries: Mean differences

Since the data are differences, a numerical summary must summarise the differences. Summarising the Before and After data is useful too, but summarising the differences is crucial because the RQ is about the differences (see below).

For the house insulation data, the appropriate numerical summary for paired data summarises the differences using means, standard deviations, and so on, as appropriate.

A mean or a median may be appropriate for describing the data.

However, the CI is about the mean of the data, and not about the data itself.

Since the sampling distribution for the sample mean (under certain conditions) has a normal distribution, the mean is appropriate for describing the sampling distribution.

A numerical summary of the energy savings from a calculator (Statistics Mode) or computer software gives the sample mean of the differences as \(\bar{d} = 0.54\), and the standard deviation of the differences as \(s_d = 1.015655\).

A formal numerical summary table is shown in Table 23.2.

TABLE 23.2: The mean, median, standard deviation and IQR for the energy consumption data (in MWh)
Mean Median Std dev IQR
Before 12.49 12.45 1.68 2.28
After 11.95 12.20 1.96 2.45
Energy savings 0.54 0.35 1.02 0.80

23.6 Sampling distribution: Means differences

The study concerns the mean energy saving (the mean difference). Every sample of \(n = 10\) houses is likely to comprise different houses, and hence different before and after energy consumptions will be recorded, and hence different energy savings will be recorded. As a result, the sample mean energy differences will vary from sample to sample. That is, the mean differences have a sampling distribution, and a standard error.

Since the differences are like a single sample of data (Chap. 22), the sampling distribution for the differences will have a similar sampling distribution to the mean of a single sample \(\bar{x}\) (provided the conditions are met; Sect. 23.9).

Definition 23.2 (Sampling distribution of a sample mean difference) The sampling distribution of a sample mean difference is described by:

  • an approximate normal distribution;
  • centred around \(\mu_d\) (the population mean difference);
  • with a standard deviation of \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n_d}}\),

when certain conditions are met, where \(n\) is the size of the sample, and \(s_d\) is the standard deviation of the individual differences in the sample.

For the home insulation data, the variation in the sample mean differences \(\bar{d}\) can be described by

  • approximate normal distribution;
  • centred around \(\mu_d\);
  • with a standard deviation of \(\displaystyle\text{s.e.}(\bar{d}) = \frac{1.015655}{\sqrt{10}} = 0.3211784\) called the standard error of the differences.

Notice that many decimal places are used in the working here; results will be rounded when reported.

23.7 Confidence intervals: Mean differences

The CI for the mean difference has the same form as for a single mean (Chap. 22), so an approximate 95% confidence interval (CI) for \(\mu_d\) is

\[ \bar{d} \pm 2 \times\text{s.e.}(\bar{d}). \] This is the same as the CI for \(\bar{x}\) if the differences are considered as the data. For the insulation data:
\[ 0.54 \pm (2 \times 0.3211784), \] or \(0.54\pm 0.642\). This CI is equivalent to \(0.54 - 0.642 = -0.102\), up to \(0.54 + 0.642 = 1.182\). We can write:

Based on the sample, an approximate 95% CI for the population mean energy saving after adding the wall cavity insulation is from \(-0.10\) to \(1.18\)MWh.

The negative number is not an energy consumption value; it is a negative mean amount of energy saved. Saving a negative amount is like using more energy.

So the 95% CI is saying that we are reasonably confident that, after adding the insulation, the mean energy-use difference is between using \(0.10\)MWh more energy to using \(1.18\)MWh less energy. Alternatively, the plausible values for the mean energy savings are between \(-0.10\) to \(1.18\)MWh.

Example 23.1 (COVID lockdown) A study of \(n = 213\) Spanish health students399 measured (among other things) the number of minutes of vigorous physical activity (PA) performed by students before and during the COVID-19 lockdown (from March to April 2020 in Spain).

Since the before and during lockdown were both measured on each participant, the data are paired. The data are summarised below.

Mean (minutes) Standard deviation (minutes)
Before 28.47 54.13
During 30.66 30.04
Difference -2.68 51.30

Notice that the differences are defined as Before minus During. A positive difference therefore means the Before value is higher; hence, the differences tell us how much longer the student spent doing vigorous PA before the COVID lockdown. Similarly, a negative value means that the During value is higher.

The parameter of interest is the population mean difference \(\mu_d\), the mean amount that students spent in vigorous PA before the lockdown compared to during the lockdown.

Also notice that the standard deviation of the difference (\(\text{s.e.}(\bar{d}) = 51.30\)) is not \(54.13 - 30.04\), or \(30.04 - 54.13\). Those calculations would find the difference between the two standard deviations... not the standard deviation of the list of differences.

Every sample would contain different students, and hence would produce different pre- and during-COVID mean amounts of PA, so those means would have standard error. Likewise, the mean of each individuals' difference would vary from sample to sample, so the mean difference would vary and hence have a standard error:

\[ \text{s.e.}(\bar{d}) = \frac{s_d}{n} = \frac{51.30}{\sqrt{213}} = 3.515018. \] The approximate 95% CI for the population mean difference is from

\[ -2.68 - (2 \times 3.515018) = -9.710036 \] to \[ -2.68 + (2 \times 3.515018) = 4.350036, \] so the approximate 95% CI for the population mean difference is from -9.71 to 4.35 minutes.

Notice that one of the values is negative. This does not mean a negative amount of PA (which would make no sense); the CI is for the population mean difference. So, a negative value means that the During values are higher than the Before values on average.

So, the CI means:

In the population, the mean difference between the amount of vigorous PA by Spanish health students is between 9.71 minutes more during lockdown, and 4.35 minutes more before lockdown.

23.8 Using software: CIs for mean differences

Software (such as jamovi or SPSS) can produce exact 95% CIs, which may be slightly different than the approximate 95% CI (since the 68--95--99.7 rule is an approximation, so the multipliers are approximate).

The approximate and exact 95% CIs are similar when the sample size is not small; here the sample size is small (\(n = 10\)). From the jamovi (Fig. 23.2) or SPSS output (Fig. 23.3):

Based on the sample, a 95% CI is for the population mean energy saving because of the wall cavity insulation is from \(-0.19\) to \(1.27\)MWh.

The insulation data: jamovi output

FIGURE 23.2: The insulation data: jamovi output

The insulation data: SPSS output

FIGURE 23.3: The insulation data: SPSS output

As expected, this 95% CI is slightly different than the CI computed by hand, since the sample size is small. For our purposes, however, using the approximate multiplier of 2 is sufficient when not using software.

23.9 Statistical validity conditions: Mean differences

As with any inferential procedure, these results apply under certain conditions. The conditions under which the CI is statistically valid for paired data are similar to those for one sample mean, rephrased for differences.

The CI computed above is statistically valid if one of these conditions is true:

  1. The sample size of differences is at least 25; or
  2. The sample size of differences is smaller than 25, and the population of differences has an approximate normal distribution.

The sample size of 25 is a rough figure here, and some books give other (similar) values (such as 30). This condition ensures that the distribution of the sample means has an approximate normal distribution so that the 68--95--99.7 rule is used. Provided the sample size is larger than about 25, this will be approximately true even if the distribution of the individuals in the population does not have a normal distribution. That is, when \(n > 25\) the sample means generally have an approximate normal distribution, even if the data themselves don't have a normal distribution.

In addition to the statistical validity condition, the CI will be

Example 23.2 (Statistical validity) For the insulation data, the sample size is small, so we require that the differences in the population follow a normal distribution. We don't know the distribution of the population, but the sample data graphed in Fig. 23.1 don't seems to identify any obvious doubts. So the CI is possibly statistically valid, but we aren't sure.

In this case then, the results may not be valid; that is, the CI limits that we calculated will be approximately correct only. (This doesn't mean the CI is useless!)

23.10 Example: Blood pressure

A US study400 examined how CHD risk factors were assessed among parts of the population with diabetes. Subjects reported to the clinic on multiple occasions. Consider this RQ:

What is the mean difference in diastolic blood pressure from the first to the second visit?

Each person has a pair of diastolic blood pressure (DBP) measurements: One each from their first and second visits. The data (shown below) are from the 141 people for whom both measurements are available (some data are missing). The differences could be computed as:

  • The first visit DBP minus the second visit DBP: the reduction in DBP; or
  • The second visit DBP minus the first visit DBP: the increase in DBP.

Either way is fine, provided the order is used consistently. Here, the observation from the second visit will be used, so that the differences represent the reduction in DBP from the first to second visit.

The parameter is \(\mu_d\), the population mean reduction in DBP.

Since the data set is large, the appropriate graphical summary is a histogram of differences (Fig. 23.4). The numerical summary can summarise both the first and second visit observations, but must summarise the differences. Numerical summaries can be computed using software, then reported in a suitable table (Table 23.3).

Histogram of the decrease in DBP between the first and second visits

FIGURE 23.4: Histogram of the decrease in DBP between the first and second visits

TABLE 23.3: The numerical summary for the diabetes data (in mm Hg). The differences are the second visit value minus the first visit value: the decreases in diastolic blood pressure from the first to second visit
Mean Standard deviation Standard error Sample size
DBP: First visit 94.48 11.473 0.966 141
DBP: Second visit 92.52 11.555 0.973 141
Decrease in DBP 1.95 8.026 0.676 141

The standard error of the sample mean is

\[ \text{s.e.}(\bar{d})=\frac{s_d}{\sqrt{n}} = \frac{8.02614}{\sqrt{141}} = 0.67592. \] Using an approximate multiplier of 2, the margin of error is:

\[ 2 \times 0.67592 = 1.3518, \] so an approximate 95% CI for the decrease in DBP is

\[ 1.9504\pm 1.3518, \] or from \(0.60\) to \(3.30\) mm Hg, after rounding sensibly. We write:

Based on the sample, an approximate 95% CI for the mean decrease in DBP is from \(0.60\) to \(3.30\) mm Hg.

The exact 95% CI from jamovi (Fig. 23.5) or SPSS (Fig. 23.6), using an exact \(t\)-multiplier rather than an approximate multiplier of 2, is similar since the sample size is large. After rounding, write:

Based on the sample, an exact 95% CI for the decrease in DBP is from \(0.61\) to \(3.29\) mm Hg.

The wording ('for the decrease in DBP') implies which reading is the higher reading on average: the first.

jamovi output for the blood pressure data, including the exact 95\% CI

FIGURE 23.5: jamovi output for the blood pressure data, including the exact 95% CI

SPSS output for the blood pressure data, including the exact 95\% CI

FIGURE 23.6: SPSS output for the blood pressure data, including the exact 95% CI

Be clear in your conclusion about how the differences are computed.

The CI is statistically valid as the sample size is larger than 25. (The data do not need to follow a normal distribution.)

Is there a mean difference in DBP in the population?

Be careful: The RQ is about the mean difference in the population... but we only have the mean difference from one of the many possible samples. So it is difficult to be certain.

23.11 Example: The stress of surgery

The concentration of beta-endorphins in the blood are a sign of stress. In one study, the beta-endorphin concentration was measured for 19 patients about to undergo surgery. Each patient had their beta-endorphin concentrations measures 12--14 hours before surgery (BeforeHours), and also 10 minutes before surgery (BeforeMins). The jamovi output in Fig. 23.7 can be used to construct an approximate 95% CI for the increase in stress as surgery gets closer.

jamovi output for the surgery-stress data

FIGURE 23.7: jamovi output for the surgery-stress data

The mean increase in beta-endorphin concentration is 7.70 fmol/mol, and the standard error for this increase is 3.10 fmol/mol. The approximate 95% CI is

\[\begin{align*} \bar{d} &\pm (2\times \text{s.e.}(\bar{d}))\\ 7.70 &\pm (2\times 3.10)\\ 7.70 &\pm 6.20, \end{align*}\] or from \(1.50\) to \(13.90\)fmol/m.

Using jamovi, the exact 95% CI is from 1.62 to 13.78 fmol/mol, very similar to the approximate CI computed manually.

Also note that the standard error in the output can be computed by hand as

\[ \text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}} = \frac{13.52}{\sqrt{19}} = 3.1017..., \] the same as in the output.

The sample size is \(n = 19\), just less than 25, so the results may not be statistically valid (but shouldn't be too bad).

23.12 Quick review questions

  1. True or false: For paired data, the mean of the differences is treated like the mean of a single variable.
  2. True or false: The appropriate graph for displaying paired data is often a histogram of the differences.
  3. True or false: The population mean difference is denoted by \(\mu_d\).
  4. True or false: The standard error of the sample mean difference is denoted by \(s_d\).

Progress:

23.13 Exercises

Selected answers are available in Sect. D.22.

Exercise 23.1 People often struggle to eat the recommended intake of vegetables. In one study exploring ways to increase vegetable intake in teens,401 teens rated the taste of raw broccoli, and raw broccoli served with a specially-made dip.

Each teen (\(n = 101\)) had a pair of measurements: the taste rating of the broccoli with and without dip. Taste was assessed using a '100 mm visual analog scale', where a higher score means a better taste. In summary:

  • For raw broccoli, the mean taste rating was \(56.0\) (with a standard deviation of \(26.6\));
  • For raw broccoli served with dip, the mean taste rating was \(61.2\) (with a standard deviation of \(28.7\)).

Because the data are paired, the differences are the best way to describe the data. The mean difference in the ratings was \(5.2\), with \(\text{s.e.}(\bar{d}) = 3.06\). From this information:

  1. Construct a suitable numerical summary table.
  2. Compute the approximate 95% CI for the mean difference in taste ratings.

Exercise 23.2 In a study of hypertension,402 15 patients were given a drug (Captopril) and their systolic blood pressure measured immediately before and two hours after being given the drug.

  1. Explain why it is sensible to compute differences as the Before minus the After measurements. What do the differences mean when computed this way?
  2. Compute the differences.
  3. Compute an approximate 95% CI for the mean difference.
  4. Write down the exact 95% CI using the computer output (jamovi: Fig. 23.8; SPSS: Fig. 23.9).
  5. Why are the two CIs different?
TABLE 23.4: The Captopril data: before after after systolic blood pressures (in mm Hg)
Before After Before After
210 201 173 147
169 165 146 136
187 166 174 151
160 157 201 168
167 147 198 179
176 145 148 129
185 168 154 131
206 180
jamovi output for the Captoril data

FIGURE 23.8: jamovi output for the Captoril data

The Captoril data: SPSS output

FIGURE 23.9: The Captoril data: SPSS output

Exercise 23.3 A study403 examined the effect of exercise on smoking. Men and women were assessed on a range of measures, including the 'intention to smoke'.

'Intention to smoke' was assessed both before and after exercise for each subject, using the 10-item quantitative Questionnaire of Smoking Urges -- Brief scale,404 and the quantitative Minnesota Nicotine Withdrawal Scale.405

Smokers (people smoking at least five cigarettes per day) aged 18 to 40 were enrolled for the study. For the 23 women in the study, the mean intention to smoke after exercise reduced by 0.66 (with a standard error of 0.37).

  1. Find a 95% confidence interval for the population mean reduction in intention to smoke for women after exercising.
  2. Is this CI statistically valid?

Exercise 23.4

## 
##  Paired t-test
## 
## data:  ANCB$After and ANCB$Before
## t = 2.2156, df = 28, p-value = 0.03502
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.2268902 5.7869029
## sample estimates:
## mean difference 
##        3.006897