23.1 Mean differences

House insulation is important for saving energy, particularly in cold climates.

Consider a study to estimate the average energy savings made by using a new type of house insulation. Different study designs could be used to address this.

One approach is to take a sample of homes, and measure the energy consumption before adding the insulation, and then after adding the insulation for the same houses. Each home gets two observations: the energy consumption before and after adding the insulation.

This is a descriptive RQ: the Outcome is the mean energy saving, and the response variable is the energy saving for each house. There is no Comparison: units of analysis that have been treated differently are not compared.

Alternatively, the researchers could take a sample of homes without the insulation, and measure their energy consumption; then take a different sample of homes with the insulation, and measure their energy consumption.

This is a relational RQ: the Outcome is the mean energy consumption, and the response variable is the energy consumption for each house. The Comparison is between units of analysis with the insulation, and units of analysis without the insulation.

Either study is possible, and each has advantages and disadvantages (Zimmerman 1997). Here the first (Descriptive) design would seem superior (why?). In the first design, each home gets a pair of energy consumption measurements: this is paired data, which is the subject of this chapter. The second (Relational) design requires the means of two different groups of homes to be compared, which is the topic of the next chapter.

Definition 23.1 (Paired data) Data are paired when two observations about the same variable are recorded for each unit of analysis.

Since each unit of analysis has two observations about energy consumption, the change (or the difference, or the reduction) in energy consumption can be computed for each house. Then, questions can be asked about the population mean difference, which is not the same as difference between two separate population means (the subject of the next chapter). In paired data, finding the difference between the two measurements for each individual unit of analysis makes sense: each unit of analysis (each house) has two related observations.

Think 23.1 (Paired situations) Which of these are paired situations?

  1. The mean difference between blood pressure for 36 people, before and after taking a drug.
  2. The difference between the mean HDL cholesterol levels for 22 males and 19 females.
  3. The mean protein levels were compared in sea turtles before and after being rehabilitated (March et al. 2018).
Only situations 1 and 3 are paired.


March DT, Vinette-Herrin K, Peters A, Ariel E, Blyde D, Hayward D, et al. Hematologic and biochemical characteristics of stranded green sea turtles. Journal of Veterinary Diagnostic Investigation. 2018;
Zimmerman DW. A note on the interpretation of the paired-samples \(t\)-test. Journal of Educational and Behavioral Statistics. 1997;22(3):349–60.