So far, you have learnt about the research process, including analysing data using confidence intervals and hypothesis tests. Specifically, you have learnt to construct confidence intervals, and performs hypothesis tests, for one groups and for comparing two separate groups.
In this chapter, you will learn about relationships between two quantitative variables. You will learn to:
- describe the relationships between two quantitative variables.
So far, RQs about single variables and RQs for comparing two groups have been studied. Comparing the mean value of a quantitative variable in two groups was studied in Sects. 24 and 32. Comparing the percentage of times an outcome of interest appears in two groups was studied in Sects. 25 and 33. In this Chapter (and the next two), the relationship between two quantitative variables is studied.
Our main example in the next three chapters is a study (Holgate 1965) that examined the relationship between the age of \(n = 78\) male red deer and the weight of their molars; the data are shown below. The data comprises two quantitative variables.
For the red deer data, both variables are quantitative, so the appropriate graphical summary (Sect. 12.6) is a scatterplot (Fig. 35.2). The response variable is graphed on the vertical axis, and denoted \(y\); the explanatory variable is graphed on the horizontal axis, and denoted \(x\). In some cases, when only a relationship is being explored, which variable is \(x\) and which is \(y\) is not important (for example, see Example 36.7.)
Since the explanatory variable (potentially) influences the response variable, in this example:
- The explanatory variable (\(x\)) is the age of the deer (in years), and
- The response variable (\(y\)) is the weight of molars (in grams).
In other words, the age of the deer may influence the weight of the molars. (Supposing that the weight of the molars may influence the age of the deer is silly.)
Each row in the dataset (and each point on the scatterplot) corresponds to a single deer (the units of analysis); two quantitative variables (age; molar weight) are measured on each deer.
The purpose of a graph is to help understand the data (Sect. 12.1). For a scatterplot, the form, direction, and variation in the relationship (or the strength of the relationship) are described:
- Form: The overall form or structure of the relationship (e.g., linear; curved upwards; etc.).
Direction: The direction of the relationship (sometimes not relevant if the relationship is non-linear):
- The variables are positively associated if high values of one variable accompany high values of the other variable, in general.
- The variables are negatively associated if high values of one variable accompany low values of the other variable, in general.
- Variation: The amount of variation in the relationship. A small amount of variation in the response variable for given values of the explanatory variable means the relationship is strong; a lot of variation in the response variable for given values of the explanatory variable means the relationship is less strong.
Anything unusual or noteworthy should also be discussed. These three features explain the type of relationship (form; direction), and the strength of that relationship (variation). Examples are shown in the carousel below (click to move through the scatterplots).
Example 35.1 (Describing scatterplots) A study (Tager et al. 1979; Kahn 2005) measured the lung capacity of children in Boston (using the forced expiratory volume, FEV). The scatterplot (Fig. 35.3) is curved (form), where older children have larger FEVs, in general (direction). The variation gets larger for taller youth.
- Form: may start off straight-ish, but then seems hard to assess.
- Direction: biomass increases as age increases (on average).
- Variation: small-ish for small ages; large-ish for older trees (after about 60 years old).
Example 35.2 (Scatterplots) For the red deer data (Fig. 35.2), the relationship is approximately linear (form) with a negative direction (older deer generally have lighter teeth); the variation is... perhaps moderate.
A scatterplot displays the relationship between two quantitative variables (the response denoted \(y\); the explanatory denoted \(x\)). The relationship is described by the form (linear, or otherwise), the direction of the relationship (sometimes not relevant if the graph is not linear), and the variation in the relationship (or the strength of the relationship).
Selected answers are available in Sect. D.32.
Exercise 35.2 A study examined the mandible length and gestational age for 167 foetuses from the 12th week of gestation onward (Royston and Altman 1994). Describe the relationship (Fig. 35.6, right panel).
Exercise 35.3 A study (Wright et al. 2021) 25 gorillas are recorded information about their chest beating and their size (measured by the breadth of the gorillas' backs). Describe the relationship (Fig. 35.7, left panel).