27 CIs for mean differences (paired data)

So far, you have learnt to ask a RQ, design a study, classify and summarise the data, and form confidence intervals. In this chapter, you will learn to

  • construct confidence intervals for a mean difference.
  • determine whether the conditions for using the confidence interval apply in a given situation.

27.1 Introduction: students starting university

What happens to students' eating habits when they start university? Many students will be responsible for their own meals for the first time, so perhaps these students forgo healthy foods for convenient, but less healthy, foods. Alternatively, perhaps they cannot afford to purchase sufficient or healthy food.

One approach to studying this is to take a sample of students who are beginning university and measure their weight, and then a different sample of students some later time and measure their weight. This is comparing between individuals. This between-individuals design compares the means of two different groups of students, the topic of the next chapter.

Another approach is to record some students' weights as they begin university, and then obtain the same students' weight at some later time. The comparison is within individuals (Sect. 2.5), and we have a repeated-measures RQ. Each student has a pair of weight measurements, and the study produces paired data, the topic of this chapter.

This second approach was used to answer this question (D. A. Levitsky, Halbmaier, and Mrdjenovic (2004), D. Levitsky (n.d.)):

For Cornell University students, what is the mean weight change in students after \(12\) weeks at university?

27.2 Paired data

Some repeated-measures RQs (Sect. 2.4) have a within-individual comparisons for two states. Then, computing the differences between the pairs of observations makes sense. The two groups are not independent (Sect. 19.7).

Pairing data, when appropriate, is useful because individuals can vary substantially. Pairing the data means that extraneous variables (potentially, confounding variables) are held constant for those paired observations. In this sense, pairing is a form of blocking (Sect. 7.2). Pairing is a good design strategy when the individuals in the pair are similar for many extraneous variables.

Definition 27.1 (Paired data) Paired data occurs when every observation in one group is related to, or can be matched sensibly to, one unique observation in another group.

Paired data arises in within-individuals comparisons:

  • Blood pressure is recorded on the same individuals before and after receiving a drug. The change in blood pressure is recorded for each person.
  • The number of visitors is recorded at many national parks (the 'individuals') on the first weekend in summer, and on the first weekend on winter. The change in visitor numbers for each national park between these time points is recorded.
  • The body temperature of dogs (the 'individuals') is measured using two types of thermometers. The difference between the two recorded temperatures from the thermometers is recorded.
  • Height is measured for each twin in a pair (the twin-pair in the 'individual'). Pairing the heights for each twin is reasonable given the shared genetics (and probably environments also). The difference between the height of the twins can be recorded for each pair.

Many of these examples can be extended to beyond two measurements. For instance, blood pressures can be recorded every thirty minutes for four hours, or temperatures can be compared using three different types of thermometers. We only study pairs of measurements, and only for quantitative variable.

27.3 Summarising data

Consider the student-eating study described above. Weight is measured for the same students at the start of university and after \(12\) weeks at university. Each student receives two measurements, and the change in weight for each individual can be recorded (data below).

Since data are paired, an appropriate graph is a histogram of the weight gains (Chap. 15). A boxplot comparing students' weights at Week \(1\) and at Week \(12\) (that is, not treating the data as paired) shows that the distribution of weights, and the median weights, are very similar (Fig. 27.1, left panel). Any difference is difficult to see and detect. In addition, the link between the weights of students in Week \(1\) and Week \(12\) has been lost.

The histogram of the weight gains makes the change in weight easier to see (Fig. 27.1, right panel). It is also easy to see that some students lost weight from Week \(1\) to Week \(12\). Graphing the Week \(1\) and Week \(12\) data may also be useful too, but a graph of the differences is crucial, as the RQ is about the differences. A case-profile plot (Sect. 15.2.2) is also appropriate, but is difficult to read here as the sample size is large (a line is needed for each unit of analysis).

Plots of the weight-loss data. Left: Treating the data incorrectly as not paired. Right: A histogram of weight changes (the vertical grey line represents no change in weight).

FIGURE 27.1: Plots of the weight-loss data. Left: Treating the data incorrectly as not paired. Right: A histogram of weight changes (the vertical grey line represents no change in weight).

Since the RQ is about weight change, a numerical summary of the differences is essential; the Week \(1\) and the Week \(12\) data can also be summarised. Since the weights and the differences are quantitative, the appropriate numerical summary includes means, standard deviations, and so on, as appropriate (found using software); see below. Notice that the standard deviation of the difference is not the difference between the standard deviations for the Week \(1\) and Week \(12\) data. (The same applies for the standard error and the median.) Instead, the standard deviation of the differences is found (i.e., the column Weight gain in the data table). The sample error of the differences is
\[ \text{s.e.}(\bar{x}) = \frac{0.956}{\sqrt{68}} = 0.116. \] All statistics are slightly different in Weeks \(1\) and \(12\); in particular, a slight weight gain is seen.

TABLE 27.1: The mean, median, standard deviation and standard error for the weight-gain data
Mean Median Standard deviation Standard error
Week 1 weight (in kg) \(61.24\) \(60.3\) \(10.970\) \(1.330\)
Week 12 weight (in kg) \(62.10\) \(60.3\) \(11.073\) \(1.343\)
Weight gain (in kg) \(\phantom{0}0.86\) \(\phantom{0}0.9\) \(\phantom{0}0.956\) \(0.116\)

27.4 Mean differences

The parameter of interest is \(\mu_d\), the population mean weight gain (in kg). The subscript \(d\) is a reminder that we are working with differences between Week \(1\) and Week \(12\) weights. Using these differences, the process for computing a CI is the same as in Chap. 25, where these differences are treated as the data. Either weight gain or weight loss could be used as the differences.

Be clear about how differences are computed. Differences could be computed as Week \(1\) minus Week \(12\) (weight loss), or Week \(12\) minus Week \(1\) (weight gain).

Either is fine: provided you are consistent throughout, the meaning of any conclusions will be the same. Here, weight gain is used.

Some weight gains are negative. This does not mean a negative weight. Since the differences are computed as Week \(12\) minus Week \(1\), a negative difference means the Week \(1\) weight is greater than the Week \(12\) weight value (i.e., a weight loss).

27.5 Notation

The notation used for paired data reflects that we work with the differences (Table 27.2). Otherwise, the notation is similar to that used in Chap. 25.

TABLE 27.2: The notation used for mean differences (paired data) compared to the notation used for one sample mean
One sample mean Mean difference
The observations: Values: \(x\) Differences: \(d\)
Sample mean: \(\bar{x}\) \(\bar{d}\)
Standard deviation: \(s\) \(s_d\)
Standard error of \(\bar{x}\): \(\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}\) \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}}\)
Sample size: Number of observations: \(n\) Number of differences: \(n\)

27.6 Describing sampling distribution

The study concerns the mean weight change (specifically, weight gains). Every possible sample of \(n = 68\) students comprises different students, and hence produces different Week \(1\) and Week \(12\) weights, and hence different weight gains. That is, the sample mean weight gains vary from sample to sample, and have a sampling distribution.

Since the differences are like a single sample of data (Chap. 25), the sampling distribution for the differences has a similar sampling distribution to that of \(\bar{x}\) (provided the conditions are met; Sect. 27.9).

Definition 27.2 (Sampling distribution of a sample mean difference) The sampling distribution of a sample mean difference is (when certain conditions are met (Sect. 27.9)) described by:

  • an approximate normal distribution,
  • centred around the sampling mean whose value is the population mean difference \(\mu_d\),
  • with a standard deviation, called the standard error of the difference, of \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n_d}}\),

where \(n\) is the number of differences, and \(s_d\) is the standard deviation of the individual differences in the sample.

A mean or a median may be appropriate for describing the data. However, the sampling distribution describes the distribution of the sample means, not the data. Since the sampling distribution (under certain conditions) has a symmetric normal distribution, the mean is appropriate for describing the sampling distribution.

For the weight-gain data, the sample mean differences \(\bar{d}\) are described by (Fig. 27.2):

  • approximate normal distribution,
  • with a sampling mean whose value is \(\mu_{{d}}\),
  • with a standard error of \(\text{s.e.}(\bar{d}) = 0.1159764\).

Many decimal places are shown here; results will be rounded when reported.

The sampling distribution is a normal distribution; it describes how the sample mean weight gain varies in samples of size $n = 68$

FIGURE 27.2: The sampling distribution is a normal distribution; it describes how the sample mean weight gain varies in samples of size \(n = 68\)

27.7 Computing confidence intervals

The CI for the mean difference has the same form as for a single mean (Chap. 25). An approximate \(95\)% confidence interval (CI) for \(\mu_d\) is
\[ \bar{d} \pm (2 \times\text{s.e.}(\bar{d})). \] This is the same as the CI for \(\bar{x}\) if the differences are treated like the data. For the eating data:
\[ 0.8618 \pm (2 \times 0.1159764), \] or \(0.862\pm 0.232\) (so the margin of error is \(0.232\)). Equivalently, the CI is from \(0.862 - 0.232 = 0.630\), up to \(0.862 + 0.232 = 1.094\). We write:

The mean weight gain from Week \(1\) to \(12\) is \(0.86\) kg (\(\text{s.e.} = 0.116\); \(n = 68\)), with an approximate \(95\)% CI from \(0.63\) kg to \(1.09\) kg.

The CI means that the plausible values for the population mean weight gain are between \(0.63\) kg and \(1.09\) kg. Alternatively, we are \(95\)% confident that, between Weeks \(1\) and \(12\), the population mean weight gain is between \(0.63\) kg and \(1.09\) kg. A weight gain of this size, though, may not have practical importance.

27.8 Using software

Statistical software produces exact \(95\)% CIs, which may be slightly different than the approximate \(95\)% CI (the \(68\)--\(95\)--\(99.7\) rule gives approximate multipliers). For the eating data, the approximate and exact \(95\)% CIs are the same to two decimal places (Fig. 27.3). We write:

The mean weight gain from Week \(1\) to Week \(12\) is \(0.86\) kg (\(\text{s.e.} = 0.116\); \(n= 68\)), with a \(95\)% CI between \(0.63\) to \(1.09\) kg.

The weight-gain data: jamovi output

FIGURE 27.3: The weight-gain data: jamovi output

27.9 Statistical validity conditions

As with any confidence interval, these results apply under certain conditions. The conditions under which the CI is statistically valid for paired data are similar to those for one sample mean, rephrased for differences.

The CI computed above is statistically valid if one of these conditions is true:

  1. The sample size of differences is at least \(25\); or
  2. The sample size of differences is smaller than \(25\), and the population of differences has an approximate normal distribution.

The sample size of \(25\) is a rough figure; some books give other (similar) values (such as \(30\)). This condition ensures that the distribution of the sample means has an approximate normal distribution (so the \(68\)--\(95\)--\(99.7\) rule can be used). Provided the sample size is larger than about \(25\), this will be approximately true even if the distribution of the differences in the population does not have a normal distribution. That is, when \(n > 25\) the sample means generally have an approximate normal distribution, even if the data themselves don't have a normal distribution.

Example 27.1 (Statistical validity) For the eating data, the sample size is \(n = 68\), so the results are statistically valid. The differences in the population, nor the weights in Week \(1\) and Week \(12\), need to follow a normal distribution.

27.10 Example: invasive plants

Skypilot is a alpine wildflower native to the Colorado Rocky Mountains (USA). In recent years, a willow shrub has been encroaching on skypilot territory and, because willow often flowers early, researchers (Kettenbach et al. 2017) are concerned that the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p. 6965). One RQ was:

In the Colorado Rocky Mountains, what is the mean difference between first-flowering day for the native skypilot and the encroaching willow?

Data for both species was collected at \(25\) different sites, so the data are paired by site (Sect. 27.1). The unit of analysis is the site, and the unit of observation is the plant. The data are shown in the table below. The 'first-flowering day' is the number of days since the start of the year (e.g., January \(12\) is 'day \(12\)') when flowers were first observed.

The parameter is \(\mu_d\), the population mean difference between the day of first flowering for skypilot, less the day of first flowering for willow. Hence, a positive value for the difference means that the skypilot values are larger, and hence that willow flowered first.

Explaining how the differences are computed is important. The differences here are skypilot minus willow first-flowering days.

The data are summarised graphically (Fig. 15.4) and numerically (Table 15.4), using software (Fig. 15.3).

The standard error of the mean difference is \(\text{s.e.}(\bar{d}) = 0.940\) (Fig. 15.3; Table 15.4. The approximate \(95\)% CI is \(1.36 \pm (2\times 0.940)\), or from \(-0.52\) to \(3.24\) days. Figure 15.3 gives the \(95\)% CI as \(-0.58\) to \(3.30\) days. Remembering that positive differences mean willow flowers earlier, we write (using the exact CI):

From the sample, the mean difference in the day of first flowering is \(1.36\) days earlier for the willow (\(\text{s.e.} = 0.940\); \(n = 25\)), with an approximate \(95\)% CI between \(0.52\) days earlier for skypilot to \(3.24\) days earlier for willow.

The CI is statistically valid since \(n = 25\).

Be clear in your conclusion about how the differences are computed. Make sure to interpret the CI consistent with how the differences are defined.

27.11 Example: chamomile tea

A study of patients with Type 2 diabetes mellitus (T2DM) randomly allocated \(32\) patients into a control group (who drank hot water), and \(32\) to receive chamomile tea (Rafraf, Zemestani, and Asghari-Jafarabadi (2015), p. 164):

The study was blinded so that the allocation of the intervention or control group was concealed from the researchers and statistician [...] The intervention group (\(n = 32\)) consumed one cup of chamomile tea [...] three times a day immediately after meals (breakfast, lunch, and dinner) for \(8\) weeks. The control group (\(n = 32\)) consumed an equivalent volume of warm water during the \(8\)-week period...

The total glucose (TG) was measured for each individual both before the intervention and after eight weeks on the intervention, in both the control and treatment groups. The data are not available, so no graphical summary of the data can be produced; however, the article gives a data summary (motivating Table 27.3). The following RQs can be asked:

  • For patients with T2DM, what is the mean reduction in TG after eight weeks drinking chamomile tea?
  • For patients with T2DM, what is the mean reduction in TG after eight weeks drinking hot water?

For the tea group, the standard error of the reduction in TG is \(\text{s.e.}(\bar{d}) = 30.37/\sqrt{32} = 5.37\), so an approximate \(95\)% CI for the reduction in TG is
\[ 38.62\pm (2\times 5.37), \text{or from $27.88$ to $49.36$ mg.dl$^{-1}$}. \] For the control group, the standard error of the reduction in TG is \(\text{s.e.}(\bar{d}) = 36.66/\sqrt{32} = 6.48\), so an approximate \(95\)% CI for the reduction in TG is
\[ -7.12\pm (2\times 6.48), \text{or from $-20.08$ to $5.84$ mg.dl$^{-1}$}. \] (A negative reduction means an increase in TG.) The chamomile tea appears to reduce TG, but not the hot water. Is the difference between the two treatments due to sampling variation? This is studied further in Sect. 34.9.

TABLE 27.3: The total glucose (in mg.dl\(^{-1}\))
\(n\) Mean Std. dev. Mean Std. dev. Mean Std. dev.
Chamomile tea \(32\) \(203.00\) \(54.96\) \(164.37\) \(50.70\) \(38.62\) \(30.37\)
Control \(32\) \(178.25\) \(53.06\) \(185.37\) \(52.59\) \(-7.12\) \(36.66\)
Difference \(\phantom{0}24.75\) \(\phantom{0}21.00\) \(45.74\)

We write:

The mean reduction in TG for those drinking chamomile tea is \(38.62\) mg.dl-1 (approx. \(95\)% CI: \(27.88\) to \(49.36\) mg.dl-1), and \(-7.12\) mg.dl-1 for those drinking water (approx. \(95\)% CI: \(-20.08\) and \(-5.84\) mg.dl-1).

The intervals have a \(95\)% chance of straddling the population mean reduction in TG. The sample sizes are larger than \(25\), so the results are statistically valid.

27.12 Chapter summary

To compute a confidence interval (CI) for a mean difference, compute the sample mean difference, \(\bar{d}\), and identify the sample size \(n\). Then compute the standard error, which quantifies how much the value of \(\bar{d}\) varies across all possible samples:
\[ \text{s.e.}(\bar{d}) = \frac{ s_d }{\sqrt{n}}, \] where \(s_d\) is the sample standard deviation. The margin of error is (Multiplier\(\times\)standard error), where the multiplier is \(2\) for an approximate \(95\)% CI (using the \(68\)--\(95\)--\(99.7\) rule). Then the CI is:
\[ \bar{d} \pm \left( \text{Multiplier}\times\text{standard error} \right). \] The statistical validity conditions should also be checked.

27.13 Quick review questions

Are the following statements true or false?

  1. For paired data, the mean of the differences is treated like the mean of a single variable.
  2. An appropriate graph for displaying paired data is often a histogram of the differences.
  3. The population mean difference is denoted \(\mu_d\).
  4. The standard error of the sample mean difference is denoted \(s_d\).

27.14 Exercises

Selected answers are available in App. E.

Exercise 27.1 Which of these scenarios are paired?

  1. Heart rate is measured for each individual when sitting and when standing. (Some individuals have their heart rate recorded first while sitting, and some first while standing.) Each person receives a pair of measurements, and the difference in heart rate between sitting and standing is recorded.
  2. The mean protein concentrations were compared in sea turtles before and after being rehabilitated (March et al. 2018).

Exercise 27.2 Which of these scenarios are paired?

  1. Heart rate was recorded for \(36\) people, both before and after exercise, to determine how much the average heart rate increase.
  2. The mean HDL cholesterol concentration is recorded for \(22\) males and \(19\) females, and the means compared.

Exercise 27.3 [Dataset: Fruit] The effect of rainfall on growing Chayote squash (Sechium edule) was studied (Mukherjee, Deb, and Devy 2019), comparing the size of the fruit in a year with normal rainfall (2015) compared to fruit in a dry year (2014) on \(24\) farms:

For Chayote squash grown in Bangalore, what is the mean difference in fruit weight between a normal and dry year?

Ten fruits were gathered from each farm in both years, and the average (mean) weight recorded for the farm. Since the same farms are used in both years, the data are paired (see above). Data is missing for Farm 20 in the dry year (2014), so there are \(n = 23\) differences.

TABLE 27.4: The weight of fruits (in g) in two different years. One observation is missing for Field 20.
Farm Dry Normal Change (in g)
\(\phantom{0}1\) \(367.75\) \(371.05\) \(\phantom{-}\phantom{0}\phantom{0}3.30\)
\(\phantom{0}2\) \(238.25\) \(218.85\) \(-19.40\)
\(\phantom{0}3\) \(271.25\) \(217.55\) \(-53.70\)
\(\phantom{0}4\) \(286.27\) \(221.70\) \(-64.57\)
\(\phantom{0}5\) \(259.20\) \(268.95\) \(\phantom{-}\phantom{0}\phantom{0}9.75\)
\(\phantom{0}6\) \(196.23\) \(194.85\) \(\phantom{0}-1.38\)
\(\phantom{0}7\) \(283.70\) \(293.00\) \(\phantom{-}\phantom{0}\phantom{0}9.30\)
\(\phantom{0}8\) \(252.05\) \(264.15\) \(\phantom{-}\phantom{0}12.10\)
\(\phantom{0}9\) \(253.70\) \(218.45\) \(-35.25\)
\(10\) \(279.80\) \(225.40\) \(-54.40\)
\(11\) \(206.05\) \(225.90\) \(\phantom{-}\phantom{0}19.85\)
\(12\) \(222.00\) \(222.85\) \(\phantom{-}\phantom{0}\phantom{0}0.85\)
\(13\) \(285.50\) \(282.25\) \(\phantom{0}-3.25\)
\(14\) \(171.50\) \(266.00\) \(\phantom{-}\phantom{0}94.50\)
\(15\) \(186.75\) \(206.20\) \(\phantom{-}\phantom{0}19.45\)
\(16\) \(219.55\) \(194.60\) \(-24.95\)
\(17\) \(198.15\) \(346.75\) \(\phantom{-}148.60\)
\(18\) \(248.10\) \(304.55\) \(\phantom{-}\phantom{0}56.45\)
\(19\) \(231.55\) \(263.20\) \(\phantom{-}\phantom{0}31.65\)
\(20\) \(223.70\)
\(21\) \(257.50\) \(258.75\) \(\phantom{-}\phantom{0}\phantom{0}1.25\)
\(22\) \(230.70\) \(248.95\) \(\phantom{-}\phantom{0}18.25\)
\(23\) \(260.50\) \(155.95\) \(-104.55\)
\(24\) \(231.85\) \(219.30\) \(-12.55\)
  1. What is the unit of analysis? What is the units of observation?
  2. Create a numerical summary table for the data (Fig. 27.4).
  3. Create a suitable graph to display the data.
  4. Construct an approximate \(95\)% CI for the mean difference in fruit weight.
jamovi output for the fruit data

FIGURE 27.4: jamovi output for the fruit data

Exercise 27.4 [Dataset: Captopril] In a study of hypertension (Hand et al. 1996; MacGregor et al. 1979), \(15\) patients were given a drug (Captopril) and their systolic blood pressure measured (in mm Hg) immediately before and two hours after being given the drug (Table 15.6).

  1. Explain why it is sensible to compute differences as the Before minus the After measurements. What do the differences mean when computed this way?
  2. Compute an approximate \(95\)% CI for the mean difference.
  3. Write down the exact \(95\)% CI using the computer output (Fig. 27.5).
  4. Why are the two CIs different?
jamovi output for the Captopril data

FIGURE 27.5: jamovi output for the Captopril data

Exercise 27.5 Some people struggle to eat the recommended intake of vegetables. One study explored ways to increase vegetable intake in teens (Fritts et al. 2018). Teens rated the taste of raw broccoli, and raw broccoli served with a specially-made dip.

Each teen (\(n = 100\)) had a pair of measurements: the taste rating of the broccoli with and without dip. Taste was assessed using a '\(100\) mm visual analog scale', where a higher score means a better taste. In summary:

  • For raw broccoli, the mean taste rating was \(56.0\) (with \(s = 26.6\));
  • For raw broccoli served with dip, the mean taste rating was \(61.2\) (with \(s = 28.7\)).

Because the data are paired, differences are the best way to describe the data. The mean difference in the ratings was \(5.2\), with \(\text{s.e.}(\bar{d}) = 3.06\). From this information:

  1. Construct a suitable numerical summary table.
  2. Compute the approximate \(95\)% CI for the mean difference in taste ratings.

Exercise 27.6 A study (Allen et al. 2018) examined the effect of exercise on smoking. Men and women were assessed on their 'intention to smoke', both before and after exercise for each subject (using two quantitative questionnaires). Smokers ('smoking at least five cigarettes per day') aged \(18\) to \(40\) were enrolled for the study. For the \(23\) women in the study, the mean intention to smoke after exercise reduced by \(0.66\) (with a standard error of \(0.37\)).

  1. Find an approximate \(95\)% confidence interval for the population mean reduction in intention to smoke for women after exercising.
  2. Is this CI statistically valid?

Exercise 27.7 [Dataset: Anorexia] Young girls (\(n = 29\)) with anorexia received cognitive behavioural treatment (Hand et al. (1996), Dataset 285), and their weight before and after treatment were recorded. In summary:

  • Before the treatment, the mean weight was \(82.69\) pounds (\(s = 4.85\) pounds);
  • After the treatment, the mean weight was \(85.7\) pounds (\(s = 8.35\) pounds).

If the standard deviation of the weight loss was \(7.31\) pounds, find an approximate \(95\)% CI for the population mean weight loss. Do you think the treatment had any impact on the mean weight of the girls?

Exercise 27.8 [Dataset: Stress] The concentration of beta-endorphins in the blood is a sign of stress. One study (Hand et al. (1996), Dataset 232; Hoaglin, Mosteller, and Tukey (2011)) measured the beta-endorphin concentration for \(19\) patients about to undergo surgery.

Each patient had their beta-endorphin concentrations measured \(12\)--\(14\) hours before surgery, and also \(10\) minutes before surgery. A numerical summary (from the jamovi output) is in Table 15.8.

  1. Use the jamovi output in Fig. 27.6 to construct an approximate \(95\)% CI for the increase in stress as surgery gets closer.
  2. Use the jamovi output in Fig. 27.6 to write down the exact \(95\)% CI for the increase in stress as surgery gets closer.
  3. Why is there a difference between the two CIs?
  4. Is the CI statistically valid?
jamovi output for the surgery-stress data

FIGURE 27.6: jamovi output for the surgery-stress data

Exercise 27.9 A study of \(n = 213\) Spanish health students (Romero-Blanco et al. 2020) measured (among other things) the number of minutes of vigorous physical activity (PA) performed by students before and during the COVID-19 lockdown (from March to April 2020 in Spain). Since the before and during lockdown were both measured on each participant, the data are paired (within individuals). The data are summarised in Table 27.5.

  1. Explain what the differences mean.
  2. Compute the standard error of the differences.
  3. Compute the approximate \(95\)% CI, and interpret what it means.
TABLE 27.5: Summary information for the COVID-lockdown exercise data for \(n = 214\) Spanish students
Mean (mins) Std. dev. (mins)
Before \(28.47\) \(54.13\)
During \(30.66\) \(30.04\)
Increase \(\phantom{0}2.68\) \(51.30\)