Processing math: 100%

13 Comparing quantitative data within individuals

So far, you have learnt to ask an RQ, design a study, collect the data, describe the data and summarise the data. In this chapter, you will learn to:

  • summarise within-individual changes in quantitative data using appropriate graphs.
  • summarise within-individual changes using summary tables.

13.1 Introduction

Sometimes the same quantitative variable is measured on each individual more than once (i.e., within-individual changes for each unit of analysis) but only a small number of times. Examples of this type of data include:

  • measurements of weekly household water consumption for many households, before and after installing water-saving devices.
  • blood pressure recorded for people at 8am, 1pm and 8pm each day.

In both cases, the same variable is measured multiple times for each individual. This chapter studies summarising within-individuals changes in quantitative variables.

13.2 Numerical summary: mean differences

When each individual has two observations, the difference between the observations can be computed for each individual. Then, the appropriate numerical summary is to summarise these differences over all individuals; for example, using the mean of these differences.

When more than two observations are made for each individual, the changes from the first observation (the ) may be computed (e.g., if the first observation is pre-intervention, or a benchmark, observation); for example, using the mean change (see Sect. 13.5).

Example 13.1 (Within-individual comparisons) Lothian, Grey, and Lands (2006) studied children with atopic asthma, and measured the immunogoblin E concentrations (IgE) before and after an intervention for each child (Table 13.1), plus the reduction in IgE for each child. The child is the individual.

For the IgE data, the numerical summary table is shown in Table 13.2. The direction of the difference is implied by the word 'reduction'.

In the numerical summary table, the information for the differences is not found by subtracting the information in one row from the other. In Table 13.2, for example, the number of differences is not 1111=0; the standard deviation of the differences is not 1615.531354.4=261.13\ensuremathμg.L1. These statistics are computed from the differences (i.e., the Reductions in Table 13.1).

TABLE 13.1: The IgE before and after an intervention, and the change in IgE (in micrograms/L).
IgE (before) in micrograms/L IgE (after) in micrograms/L IgE reduction in micrograms/L
0083 0083 0000
0292 0292 0000
0293 0292 0001
0623 0542 0081
0792 0709 0083
1543 1000 0543
1668 1000 0668
1960 1626 0334
2877 2502 0375
2961 2711 0250
5504 4504 1000
TABLE 13.2: A numerical summary of the IgE data (in μg/L).
Mean Std dev. Sample size
Before 01690.5 1615.53 11
After 01387.4 1354.28 11
Reduction 00303.2 0325.28 11

13.3 Graphs for the differences

For within-individual changes for a quantitative variable, options for plotting include:

  • histograms of differences (Sect. 13.3.1), which are useful for changes in pairs of measurements or observations for each individual.
  • case-profile plots (Sect. 13.3.2), which are useful when the same individuals are measured or observed a small number of times.

13.3.1 Histogram of differences

Sometimes the same variable is measured on each unit of analysis twice, when the changes (or differences) for each individual can be produced, and a histogram of the changes or differences can be constructed. The direction of the differences should be clear (e.g., first measurement minus second, or second measurement minus first).

Issues relevant for constructing histograms (Sect. 11.3.1), such as bin widths and boundary values, also apply here.

The axis displaying the counts (or percentages) should start from zero, since the height of the bars visually implies the frequency of those differences (see Example 17.3).

Example 13.2 (Within-individual comparisons) For the IgE data (Table 13.1), the reduction in IgE for each child can be shown using a histogram (Fig. 13.1, top panel).

The IgE data. Top: a histogram of the differences. Bottom: a case-profile plot. Each line represents one subject, joining that person's pre-intervention score to their post-intervention score.

FIGURE 13.1: The IgE data. Top: a histogram of the differences. Bottom: a case-profile plot. Each line represents one subject, joining that person's pre-intervention score to their post-intervention score.

13.3.2 Case-profile plots

Sometimes the variable is measured or recorded more than twice, and so a single set of differences cannot be produced. In these cases, the values for each individual can be plotted using a case-profile plot: the measurements are shown on one axis (usually the vertical), and the various points at which measurements are taken are shown on the other axis. A case-profile plot is still useful for paired data, of course.

The axis displaying the counts (or percentages) need not start from zero, since the distance from the axis to the lines do not visually imply any quantity of interest. Rather, the relative changes represented by the lines display the quantity of interest.

Example 13.3 (Case-profile plot) For the IgE data (Table 13.1), the measurements of IgE for each child at both times can be shown in a case-profile plot (Fig. 13.1, bottom panel). Each line corresponds to a unit of analysis (i.e., a child).

Example 13.4 (Case-profile plot) Runners use wearable devices to measure many performance indicators, including vertical oscillation (VO). VO contributes to running economy and injury risk, so reliable VO measurements are crucial. Smith et al. (2022) compared four devices, and obtained data from video analysis for n=150 athletes; that is, each participant had the same runs measured using five methods. The case-profile plot (Fig. 13.2) shows the means for each method using a solid point; each line represents one runner. NOVA and Footpod give smaller VO measurements in general.

Vertical oscillation (VO) measured using five methods for $15$ runners. The solid black points represent the means for each method. Left: a line is plotted for each individual. Right: only the means are shown, with vertical lines from the minimum value to the maximum value for each method.

FIGURE 13.2: Vertical oscillation (VO) measured using five methods for 15 runners. The solid black points represent the means for each method. Left: a line is plotted for each individual. Right: only the means are shown, with vertical lines from the minimum value to the maximum value for each method.

As in Example 13.4, the case-profile plot is hard to read with large numbers of individuals, and so sometimes the mean (or median, as appropriate) is shown, with some measure of the variation of the observations (Fig. 13.2 shows the minimum and maximum values for each method, for instance).

13.4 Example: invasive plants

Skypilot (Polemonium viscosum) is a native alpine wildflower growing in the Colorado Rocky Mountains (USA). In recent years, a willow shrub (Salix) has been encroaching on skypilot territory and, because willow often flowers early, researchers (Kettenbach et al. 2017) are concerned that the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p. 6965). One RQ was:

In the Colorado Rocky Mountains, what is the mean difference between first-flowering day for the native skypilot and the encroaching willow?

Data for both species was collected at 25 different sites (Fig 13.3). The site is the individual; the data are paired (Sect. 29.1), a form of blocking (Sect. 7.2). The 'first-flowering day' is the number of days since the start of the year (e.g., January 12 is 'day 12') when flowers were first observed.

FIGURE 13.3: The cyclone data.

Since the data are available, the data should be summarised graphically (Fig. 13.5) and numerically (Table 13.3), using software output (Fig. 13.4).

Software output for the flowering-day data.

FIGURE 13.4: Software output for the flowering-day data.

TABLE 13.3: The day of first flowering for encroaching willow and native skypilot.
Mean Std dev. Sample size
Willow (encroaching) 189.40 12.200 25
Skypilot (native) 190.76 13.062 25
Differences 001.36 04.698 25
The flowering-day data. Left: a histogram of the difference between the first-flowering days (skypilot minus willow). Right: a case-profile plot of days of first flowering (unfilled points and dashed lines indicate earlier or same dates (smaller or equal values) for willow).

FIGURE 13.5: The flowering-day data. Left: a histogram of the difference between the first-flowering days (skypilot minus willow). Right: a case-profile plot of days of first flowering (unfilled points and dashed lines indicate earlier or same dates (smaller or equal values) for willow).

13.5 Example: pain-relieving tape

Naugle et al. (2021) studied the effect of using Kinesio Tape to alleviate pain in athletes. Pain was measured by applying a slow constant rate of pressure on the left arm, and subjects pressed a button when the sensation moved from pressure to pain. The pressure at which this occurred was recorded. This was repeated 5mins before applying the tape, 5mins after applying the tape, and again 15--20mins after applying the tape.

Figure 13.6 shows the reported pain for 16 subjects. A summary table is shown in Table 13.4. The pain thresholds are increasing slightly as time progresses.

Pain threshold (left arm) at three time points when using Kinesio Tape, without applying tension, for $n = 16$ subjects. The filled, black points represent the means for each time point.

FIGURE 13.6: Pain threshold (left arm) at three time points when using Kinesio Tape, without applying tension, for n=16 subjects. The filled, black points represent the means for each time point.

TABLE 13.4: A numerical summary of the tape data: pain thresholds in kPa.
Mean (in kPa) Std dev. (in kPa) Sample size Mean increase SD increase
Pre: 5mins 446.5 175.18 16
Post: 5mins 479.6 199.61 16 33.1 073.93
Post: 1520mins 506.9 214.36 16 60.4 102.72

13.6 Chapter summary

Quantitative data measured within individuals can be summarised using a histogram of differences when the variable is measured or observed twice, or a case-profile plot (with two or more measurement or observations per individual). A summary table should show the numerical summaries for the quantitative variable at each measurement or observation and for appropriate changes.

13.7 Quick review questions

Are the following statements true or false?

  1. A histogram of the differences is only appropriate for with two within-individuals measurements or observations.
  2. A case-profile plot is only appropriate for showing changes for two within-individuals measurements or observations.
  3. The median and IQR are not appropriate for summarising differences.
  4. Explaining how the differences are computed is important.
0 of 4 correct

13.8 Exercises

Answers to odd-numbered exercises are given at the end of the book.

Exercise 13.1 [Dataset: Insulation] The Electricity Council in Bristol wanted to determine if a certain type of wall-cavity insulation reduced average energy consumption in winter (The Open University 1983; Hand et al. 1996):

In Bristol homes, what is the mean reduction in energy consumption after adding home insulation?

  1. What are the individuals (units of analysis)?
  2. Explain why this study uses a within-individuals comparison.
  3. Use the collected data (shown below) to sketch a case-profile plot.
  4. Use the data to sketch a histogram of the differences.
  5. Use software or a calculator to prepare a summary table.

Exercise 13.2 [Dataset: Captopril] In a study of hypertension (Hand et al. 1996; MacGregor et al. 1979), 15 patients were given a drug (Captopril) and their systolic blood pressure measured (in mm Hg) immediately before and two hours after being given the drug (Table 13.5).

  1. Explain why this study uses a within-individuals comparison.
  2. Construct a histogram of the differences.
  3. Construct a case-profile plot for the data.
TABLE 13.5: The Captopril data: before after after systolic blood pressures (in mm Hg).
Before After Before After
210 201 173 147
169 165 146 136
187 166 174 151
160 157 201 168
167 147 198 179
176 145 148 129
185 168 154 131
206 180

Exercise 13.3 [Dataset: PainRelief] Augustino et al. (2023) measured the reported pain of new mothers in Dodoma (Tanzania) at four times: near giving birth, then 2040 and 60mins after giving birth. Mothers were administered either paracetamol or a cold pack as pain relief. Pain was recorded using a 'numeric rating scale represented by the horizontal line marked from zero to ten', where higher scores mean greater pain.

Since the number of individuals is large (n=912), use the summary data in Table 13.6 to sketch a plot of the means and the range, like that in Figure 13.6.

TABLE 13.6: A summary table of reported pain for mothers after giving birth.
At birth After 20 mins After 40 mins After 60 mins
Paracetamol Mean 07.44 06.89 4.69 2.84
(n=456) Standard deviation 02.01 01.83 1.49 1.19
Minimum 02.00 02.00 2.00 0.00
Maximum 10.00 10.00 9.00 7.00
Cold pack Mean 08.63 05.67 3.19 0.99
(n=455) Standard deviation 01.40 02.03 1.63 0.99
Minimum 04.00 00.00 0.00 0.00
Maximum 10.00 09.00 6.00 4.00

Exercise 13.4 [Dataset: Stress] The concentration of beta-endorphins in the blood is a sign of stress. One study (Hand et al. (1996), Dataset 232; Hoaglin, Mosteller, and Tukey (2011)) measured the beta-endorphin concentration for 19 patients about to undergo surgery.

Each patient had their beta-endorphin concentrations measured 12--14h before surgery, and also 10mins before surgery. A numerical summary (from software output) is in Table 13.7.

TABLE 13.7: The numerical summary for the presurgical stress data.
Mean Std deviation Sample size
12--14 h before surgery 08.35 04.397 19
10 min before surgery 16.05 12.509 19
Increase 07.70 13.519 19
  1. Explain why this study uses a within-individuals comparison.
  2. Explain why the standard deviation for the increase is not the difference between the two individuals time-point standard deviations.
  3. Using the data file and software, construct a histogram of the differences.
  4. Using the data file and software, construct a case-profile plot for the data.

Exercise 13.5 [Dataset: Running] Create a summary table for the data in Example 13.4.

Exercise 13.6 [Dataset: WCTennis] Alberca et al. (2022) recorded the push-time for French wheelchair tennis players, while holding and not holding a racquet (Table 13.8; Alberca (2022)).

  1. What do the differences mean (as given in the table)?
  2. Create a plot of the data.
  3. Create a numerical summary table for the data.
TABLE 13.8: The wheelchair-tennis data. One observation is missing.
Push-time (in s)
Person With racquet Without racquet Difference (in s)
001 00.2625 00.1833 000.0792
002 00.2375 00.2250 000.0125
003 00.2583 00.2042 000.0542
004 00.1917 00.1875 000.0042
005 00.1875 00.1708 000.0167
006 00.2542 00.1750 000.0792
007 00.2333 00.1917 000.0417
008 00.1917 00.1708 000.0208
009 00.2208 00.2208 000.0000
010 00.2583 00.2750 00.0167
011 00.2083 00.1750 000.0333
012 00.2042
013 00.2208 00.2292 00.0083

Exercise 13.7 [Dataset: Jumping] Hébert-Losier, Boswell-Smith, and Hanzlı́ková (2023) recorded double-legged jumping distance for 80 healthy people, when they wore shoes and were barefoot (data below).

  1. What do the differences mean (as given in the table)?
  2. Create a plot of the data.
  3. Create a numerical summary table for the data.