14 Comparing quantitative data within individuals

So far, you have learnt to ask a RQ, design a study, collect the data, describe the data and summarise the data. In this chapter, you will learn to:

  • summarise within-individual changes in a quantitative data using appropriate graphs.
  • summarise within-individual changes using in summary tables.

14.1 Introduction

Sometimes the same quantitative variable is measured on each individual more than once (i.e., within-individual changes for each unit of analysis) but only a small number of times. Examples of this type of data include:

  • Measurements of household water consumption for many households, before and after installing water-saving devices.
  • Blood pressure recorded for people at \(8\)am, \(1\)pm and \(8\)pm each day.

In both cases, the same variable is measured multiple times for each individual. Within-individuals changes require different methods than between-individuals comparisons. This chapter considers how to summarise within-individuals changes in quantitative variables.

14.2 Summarising the comparison: mean differences

The best way to compare two observations for each individual is to summarise the differences between both observations for the individuals; for example, using the mean difference. A numerical summary table can be constructed summarising both groups, plus the differences.

When more than two observations are made for each individual, often the changes from the first observation can be computed (as the first observation is often a pre-intervention observation, or a benchmark observation); for example, using the mean change.

Example 14.1 (Within-individual comparisons) Lothian, Grey, and Lands (2006) studied children with atopic asthma, and measured the immunogoblin E concentrations (IgE) before and after an intervention for each child (Table 14.1). The child is the individual.

For the IgE data, the numerical summary table is shown in Table 14.2. The direction of the difference is implied by the word 'reduction'.

TABLE 14.1: The IgE before and after an intervention, and the change in IgE (in micrograms/L).
IgE (before) in micrograms/L IgE (after) in micrograms/L IgE reduction in micrograms/L
\(\phantom{0}\phantom{0}83\) \(\phantom{0}\phantom{0}83\) \(\phantom{0}\phantom{0}\phantom{0}0\)
\(\phantom{0}292\) \(\phantom{0}292\) \(\phantom{0}\phantom{0}\phantom{0}0\)
\(\phantom{0}293\) \(\phantom{0}292\) \(\phantom{0}\phantom{0}\phantom{0}1\)
\(\phantom{0}623\) \(\phantom{0}542\) \(\phantom{0}\phantom{0}81\)
\(\phantom{0}792\) \(\phantom{0}709\) \(\phantom{0}\phantom{0}83\)
\(1543\) \(1000\) \(\phantom{0}543\)
\(1668\) \(1000\) \(\phantom{0}668\)
\(1960\) \(1626\) \(\phantom{0}334\)
\(2877\) \(2502\) \(\phantom{0}375\)
\(2961\) \(2711\) \(\phantom{0}250\)
\(5504\) \(4504\) \(1000\)
TABLE 14.2: A numerical summary of the IgE data (in \(\mu\)g/L).
Mean Std. dev. Sample size
Before \(\phantom{0}1690.5\) \(1615.53\) \(11\)
After \(\phantom{0}1387.4\) \(1354.28\) \(11\)
Reduction \(\phantom{0}\phantom{0}303.2\) \(\phantom{0}325.28\) \(11\)

14.3 Graphs

For within-individual changes for a quantitative variable, options for plotting include:

  • Histograms of differences (Sect. 14.3.1): useful for changes in pairs of measurements or observations.
  • Case-profile plots (Sect. 14.3.2): useful when the same individuals are measured or observed a small number of times.

14.3.1 Histogram of differences

Sometimes the same variable is measured on each unit of analysis twice, when the changes (or differences) for each individual can be produced, and a histogram of the changes or differences can be constructed. The direction of the differences should be clear (e.g., first measurement minus second, or second measurement minus first).

Issues relevant for constructing histograms (Sect. 11.3.1), such as bin widths and boundary values, also apply here.

The axis displaying the counts (or percentages) should start from zero, since the height of the bars visually implies the frequency of those differences (see Example 17.3).

Example 14.2 (Within-individual comparisons) For the IgE data (Table 14.1), the reduction in IgE for each child can be shown using a histogram (Fig. 14.1, top panel).

The IgE data. Top: a histogram of the differences. Bottom: a case-profile plot. Each line represents one subject, joining that person's pre-intervention score to their post-intervention score.

FIGURE 14.1: The IgE data. Top: a histogram of the differences. Bottom: a case-profile plot. Each line represents one subject, joining that person's pre-intervention score to their post-intervention score.

14.3.2 Case-profile plots

Sometimes the variable is measured or recorded more than twice, and so a single set of differences cannot be produced. In these cases, the values for each individual can be plotted using a case-profile plot. A case-profile plot is still useful for paired data, of course.

The axis displaying the counts (or percentages) need not start from zero, since the distance from the axis to the lines do not visually imply any quantity of interest. Rather, the relative changes represented by the lines display the quantity of interest.

Example 14.3 (Case-profile plot) For the IgE data (Table 14.1), the measurements of IgE for each child at both times can be shown in a case-profile plot (Fig. 14.1, bottom panel). Each line corresponds to a unit of analysis (i.e., a child).

Example 14.4 (Case-profile plot) Runners use wearable devices to measure many performance indicators, including vertical oscillation (VO). VO contributes to running economy and injury risk, so reliable VO measurements are crucial. Smith et al. (2022) compared four devices, and obtained data from video analysis for \(n = 150\) athletes; that is, each participant had the same runs measured using five methods. The case-profile plot (Fig. 14.2) shows the means for each method using a solid point. NOVA and Footpod give smaller VO measurements in general.

Vertical oscillation measured using five methods for $15$ runners. The solid black points represent the means for each method. Left: a line is plotted for each individuals. Right: only the means are shown, with vertical lines from the minimum value to the maximum value.

FIGURE 14.2: Vertical oscillation measured using five methods for \(15\) runners. The solid black points represent the means for each method. Left: a line is plotted for each individuals. Right: only the means are shown, with vertical lines from the minimum value to the maximum value.

As in Example 14.4, the case-profile plot is hard to read with large numbers of individuals, and so sometimes the mean (or median, as appropriate) is shown, with some measure of the variation of the observations (Fig. 14.2 shows the minimum and maximum values for each method, for instance).

14.4 Example: invasive plants

Skypilot (Polemonium viscosum) is a native alpine wildflower growing in the Colorado Rocky Mountains (USA). In recent years, a willow shrub (Salix) has been encroaching on skypilot territory and, because willow often flowers early, researchers (Kettenbach et al. 2017) are concerned that the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p. 6 965). One RQ was:

In the Colorado Rocky Mountains, what is the mean difference between first-flowering day for the native skypilot and the encroaching willow?

Data for both species was collected at \(25\) different sites. The site is the individual; the data are paired (Sect. 26.1), a form of blocking (Sect. 7.2). The data are shown in the table below. The 'first-flowering day' is the number of days since the start of the year (e.g., January 12 is 'day 12') when flowers were first observed.

TABLE 14.3: The day of the year of first flowering by encroaching willow and native skypilot.
Site Willow Skypilot
\(\phantom{0}1\) \(201\) \(201\)
\(\phantom{0}2\) \(178\) \(179\)
\(\phantom{0}3\) \(189\) \(189\)
\(\phantom{0}4\) \(189\) \(189\)
\(\phantom{0}5\) \(196\) \(203\)
\(\phantom{0}6\) \(207\) \(203\)
\(\phantom{0}7\) \(199\) \(199\)
\(\phantom{0}8\) \(178\) \(182\)
\(\phantom{0}9\) \(178\) \(178\)
\(10\) \(191\) \(191\)
\(11\) \(187\) \(192\)
\(12\) \(190\) \(197\)
\(13\) \(190\) \(190\)
\(14\) \(209\) \(209\)
\(15\) \(221\) \(221\)
\(16\) \(179\) \(188\)
\(17\) \(174\) \(179\)
\(18\) \(172\) \(166\)
\(19\) \(196\) \(196\)
\(20\) \(173\) \(173\)
\(21\) \(180\) \(173\)
\(22\) \(181\) \(179\)
\(23\) \(186\) \(186\)
\(24\) \(194\) \(209\)
\(25\) \(197\) \(197\)

Since the raw data are available, the data should be summarised graphically (Fig. 14.4) and numerically (Table 14.4), using software output (Fig. 14.3).

Software output for the flowering-day data.

FIGURE 14.3: Software output for the flowering-day data.

TABLE 14.4: The day of first flowering for encroaching willow and native skypilot.
Mean Std. dev. Sample size
Willow (encroaching) \(189.40\) \(12.200\) \(25\)
Skypilot (native) \(190.76\) \(13.062\) \(25\)
Differences \(\phantom{0}\phantom{0}1.36\) \(\phantom{0}4.698\) \(25\)
The flowering-day data. Left: a histogram of the difference between the first-flowering days (skypilot minus willow). Right: a case-profile plot of days of first flowering (unfilled points and dashed lines indicate earlier dates (smaller values) for willow).

FIGURE 14.4: The flowering-day data. Left: a histogram of the difference between the first-flowering days (skypilot minus willow). Right: a case-profile plot of days of first flowering (unfilled points and dashed lines indicate earlier dates (smaller values) for willow).

14.5 Example: pain-relieving tape

A study examined the effect of using Kinesio Tape (Naugle et al. 2021) to alleviate pain in athletes. Pain was measured by applying a slow constant rate of pressure on the left arm, and subjects pressed a button when the sensation moved from pressure to pain. The pressure when this occurred was recorded. This was repeated \(5\)before applying the tape, \(5\)after applying the tape, and again \(15\)--\(20\)after applying the tape.

Figure 14.5 shows the reported pain for \(16\) subjects. A summary table is shown in Table 14.5. The pain threshold are increasing slightly as time progresses.

Pain threshold (left arm) at three time points when using Kinesio Tape, without applying tension, for $n = 16$ subjects. The black points represent the means for each time point.

FIGURE 14.5: Pain threshold (left arm) at three time points when using Kinesio Tape, without applying tension, for \(n = 16\) subjects. The black points represent the means for each time point.

TABLE 14.5: A numerical summary of the Tape data: pain thresholds in kPa.
Mean (in kPa) Std. dev. (in kPa) Sample size Mean increase SD increase
Pre: 5 mins \(446.5\) \(175.18\) \(16\)
Post: 5 mins \(479.6\) \(199.61\) \(16\) \(33.1\) \(\phantom{0}73.93\)
Post: 15-20 mins \(506.9\) \(214.36\) \(16\) \(60.4\) \(102.72\)

14.6 Chapter summary

Quantitative data measured within individuals can be summarised using a histogram of differences when the variable is measured (or observed) twice, or a case-profile plot (with two or more measurement or observations). A summary table should show the numerical summaries for the quantitative variable at each measurement or observation and for appropriate changes.

14.7 Quick review questions

Are the following statements true or false?

  1. A histogram of the differences is only appropriate for showing changes for two measurements or observations.
  2. A case-profile plot is only appropriate for showing changes for two measurements or observations.
  3. The median and IQR are not appropriate for summarising differences.
  4. Explaining how the differences are computed is important.

14.8 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 14.1 [Dataset: Insulation] The Electricity Council in Bristol wanted to determine if a certain type of wall-cavity insulation reduced average energy consumption in winter (The Open University (1983), Hand et al. (1996)). Their RQ was:

In Bristol homes, what is the mean reduction in energy consumption after adding home insulation?

  1. What are the individuals (units of analysis)?
  2. Explain why this study uses a within-individuals comparison.
  3. Use the collected data (shown below) to sketch a case-profile plot.
  4. Use the data to sketch a histogram of the differences.
  5. Use software or a calculator to prepare a summary table.

Exercise 14.2 [Dataset: Captopril] In a study of hypertension (Hand et al. 1996; MacGregor et al. 1979), \(15\) patients were given a drug (Captopril) and their systolic blood pressure measured (in mm Hg) immediately before and two hours after being given the drug (Table 14.6).

  1. Explain why this study uses a within-individuals comparison.
  2. Construct a histogram of the differences.
  3. Construct a case-profile plot for the data.
TABLE 14.6: The Captopril data: before after after systolic blood pressures (in mm Hg).
Before After Before After
\(210\) \(201\) \(173\) \(147\)
\(169\) \(165\) \(146\) \(136\)
\(187\) \(166\) \(174\) \(151\)
\(160\) \(157\) \(201\) \(168\)
\(167\) \(147\) \(198\) \(179\)
\(176\) \(145\) \(148\) \(129\)
\(185\) \(168\) \(154\) \(131\)
\(206\) \(180\)

Exercise 14.3 [Dataset: PainRelief] Augustino et al. (2023) measured the reported pain of new mothers in Dodoma (Tanzania) at four times: near giving birth, then \(20\), \(40\) and \(60\)after giving birth. Mothers were administered either paracetamol or a cold pack as pain relief. Pain was recorded using a 'numeric rating scale represented by the horizontal line marked from zero to ten', where higher scores mean greater pain.

Since the number of individuals is large (\(n = 912\)), use the summary data in Table 14.7 to sketch a plot of the means and the range, like that in Figure 14.5.

TABLE 14.7: A summary table of reported pain for mothers after giving birth.
At birth After 20 mins After 40 mins After 60 mins
Paracetamol Mean \(7.44\) \(6.89\) \(4.69\) \(2.84\)
(\(n = 456\)) Std. deviation \(2.01\) \(1.83\) \(1.49\) \(1.19\)
Minimum \(2.00\) \(2.00\) \(2.00\) \(0.00\)
Maximum \(10.00\) \(10.00\) \(9.00\) \(7.00\)
Cold pack Mean \(8.63\) \(5.67\) \(3.19\) \(0.99\)
(\(n = 455\)) Std. deviation \(1.40\) \(2.03\) \(1.63\) \(0.99\)
Minimum \(4.00\) \(0.00\) \(0.00\) \(0.00\)
Maximum \(10.00\) \(9.00\) \(6.00\) \(4.00\)

Exercise 14.4 [Dataset: Stress] The concentration of beta-endorphins in the blood is a sign of stress. One study (Hand et al. (1996), Dataset 232; Hoaglin, Mosteller, and Tukey (2011)) measured the beta-endorphin concentration for \(19\) patients about to undergo surgery.

Each patient had their beta-endorphin concentrations measured \(12\)--\(14\) hours before surgery, and also \(10\) minutes before surgery. A numerical summary (from software output) is in Table 14.8.

TABLE 14.8: The numerical summary for the presurgical stress data.
Mean Std deviation Sample size
12--14 hours before surgery \(\phantom{0}8.35\) \(\phantom{0}4.397\) \(19\)
10 minutes before surgery \(16.05\) \(12.509\) \(19\)
Increase \(\phantom{0}7.70\) \(13.519\) \(19\)
  1. Explain why this study uses a within-individuals comparison.
  2. Construct a histogram of the differences.
  3. Construct a case-profile plot for the data.

Exercise 14.5 Romero-Blanco et al. (2020) measured (among other things) the number of minutes of vigorous physical activity (PA) performed by Spanish health students before and during the COVID-19 lockdown (from March to April 2020 in Spain). Since the before and during lockdown were both measured on each participant, the data are paired (within individuals). The data are summarised in Table 14.9.

  1. Construct a histogram of the differences.
  2. Construct a case-profile plot for the data.
TABLE 14.9: Summary information for the COVID-lockdown exercise data for \(n = 214\) Spanish students.
Mean (mins) Std. dev. (mins)
Before \(28.47\) \(54.13\)
During \(30.66\) \(30.04\)
Increase \(\phantom{0}2.68\) \(51.30\)

Exercise 14.6 [Dataset: Running] Create a summary table for the data in Example 14.4.