34.4 Hypothesis testing

34.4.1 Introduction

For the red deer data (Sect. 33.2; Sect. 34.1), the population correlation coefficient between the weight of molars \(y\) and age of the deer \(x\) is unknown and denoted by \(\rho\). The sample correlation coefficient is \(r = -0.584\), but the value of \(r\) varies from sample to sample (there is sampling variation).

The size of the sampling variation is measured with a standard error. However, there is a complication for correlation coefficients¹⁶; so we will not produce CIs for the correlation coefficient.

34.4.2 Hypothesis testing details

As usual, questions can be asked about the relationship between the variables, as measured by the unknown population correlation coefficient:

Is the population correlation coefficient zero, or not?

In the context of the red deer data:

In male red deer, is there a correlation between age and the weight of molars?

The RQ is about the population parameter \(\rho\). Clearly, the sample correlation coefficient \(r\) is not zero, and the RQ is asking whether this could be attributed to sampling variation. The null hypotheses is:

\(H_0\): \(\rho = 0\)

The parameter is \(\rho\), the population correlation between the age and molar weight in the red deer.

This is the usual ‘no relationship’ position, which proposes that the population correlation coefficient is zero. The alternative hypothesis is:

\(H_1\): \(\rho \ne 0\)

This is a two-tailed test here, based on the RQ.

The approach is to assume that \(\rho=0\) (from \(H_0\)), then describe what values of \(r\) could be expected, under that assumption, just through sampling variation. Then the observed value of \(r\) is compared to the expected values to determine if the valuew of \(r\) supports or contradicts the assumption.

Software is used to test the hypotheses; the output in Figs. 34.5 (jamovi) and 34.6 (SPSS) contains the relevant \(P\)-value (twice in the SPSS output!). The two-tailed \(P\)-value for the test (labelled Sig. by SPSS) is less than 0.001 (0.000 in SPSS). That is, the \(P\)-value is zero to three decimal places, so there is very strong evidence to support \(H_1\) (that the correlation in the population is not zero). We write:

The sample presents very strong evidence (two-tailed \(P < 0.001\)) of a correlation between molar weight and the age of the male red deer (\(r = -0.584\); \(n = 78\)) in the population.

Notice the three features of writing conclusions again: An answer to the RQ; evidence to support the conclusion (‘two-tailed \(P < 0.001\)’; no test statistic is given); and some sample summary information (‘\(r = -0.584\); \(n = 78\)’).

The evidence suggests that the correlation is not zero (in the population). However, a non-zero correlation doesn’t necessarily mean a strong correlation.

The correlation may be weak in the population (as estimated by the value of \(r\)), but there is evidence that it is not zero in the population.

This may be a useful analogy: If a rain forecast says ‘there is a very high chance of rain tomorrow,’ it doesn’t mean there will be a lot of rain, just a high chance of some rain.

34.4.3 Statistical validity conditions

As usual, these results hold under certain conditions to be met. The conditions for which the test is statistically valid are:

The relationship is approximately linear.
The variation in the response variable is approximately constant for all values of the explanatory variable.
The sample size is at least 25.

The sample size of 25 is a rough figure here, and some books give other values.

In addition to the statistical validity condition, the test will be externally valid if the the sample is a simple random sample from the population. The test will also be internally valid if the study was well designed.

Example 34.5 (Statistical validity) For the red deer data, the scatterplot (Fig. 33.2) shows that the relationship is approximately linear, and the variation in molar weights doesn’t seem to be obviously getting larger or smaller for older deer, so correlations are sensible. The sample size is also greater than 25.

The test in Sect. 34.4 will be statistically valid.

Example 34.6 (Statistical validity) A study (Schepaschenko et al. 2017; Dunn and Smyth 2018) examined the foliage biomass of small-leaved lime trees. A plot of the foliage biomass against diameter (Fig. 34.7) shows that the relationship is non-linear. In addition, the variation in foliage biomass increases for larger diameters (for values of \(x\) near 10, the values of \(y\) do not vary much at all, but for values of \(x\) near 30, the values of \(y\) vary greatly).

Both of these issues mean that correlations are not appropriate. A hypothesis test similar to that in Sect. 34.4 is inappropriate.

FIGURE 34.7: Foliage biomass plotted against diameter for small-leaved lime trees

Example 34.7 (Phu Quoc ridgeback dogs) A study of Phu Quoc Ridgeback dogs (Canis familiaris) recorded many measurements of the dogs, including body length and body height (Quan et al. 2017).

The scatterplot (Fig. 34.8) shows an approximate linear relationship. We know that each sample could produce a different sample correlation coefficient. We expect that taller dogs would also be longer, so we may ask:

For these dogs, are longer dogs also taller dogs, in general?

The hypotheses are:

\(H_0\): \(\rho = 0\)
\(H_1\): \(\rho > 0\) (i.e., one-tailed)

The correlation co-efficient is \(r = 0.837\) and software notes that the two-tailed \(P < 0.001\), based on \(n = 30\) dogs.

We write:

There is very strong evidence that longer Phu Quoc ridgeback dogs are also taller (\(r = 0.837\); one-tailed \(P<0.001\); \(n = 30\)).

Since (a) the sample size is larger than 25; (b) the relationship is approximately linear; and (c) the variation in heights do not seem to differ for different lengths, the test is statistically valid.

FIGURE 34.8: Scatterplot of the body height vs body length for Phu Quoc ridgeback dogs

Example 34.8 (Drug calculations) A study of \(n = 30\) paramedicine students examined (among other things) the relationship between the amount of stress experienced (measured using the State–Trait Anxiety Inventory (STAI) while performing drug-dose calculation, and length of work experience (LeBlanc et al. 2005).

The hypotheses are:

\(H_0\): \(\rho = 0\)
\(H_1\): \(\rho \ne 0\)

The correlation co-efficient is given as \(r = 0.346\) and \(P = 0.18\).

No scatterplot is provided, so the test is statistically valid only if the relationship is approximately linear and that the variation in STAI scores does not vary for different levels of work experience. The sample size is larger than 25, however.

We write:

There is no evidence (\(r = 0.346\); two-tailed \(P = 0.18\)) that the length of work experience is associated with STAI stress levels when performing drug-dose calculations.

References

Dunn PK, Smyth GK. Generalized linear models with examples in R. Springer; 2018.

LeBlanc VR, MacDonald RD, McArthur B, King K, Lepine T. Paramedic performance in calculating drug dosages following stressful scenarios in a human patient simulator. Prehospital Emergency Care. 2005;9(4):439–44.

Quan Q-D, Tran H-D, Chung A-D. The relation of body score (body height/body length) and haplotype E on Phu Quoc Ridgeback dogs (Canis familiaris). Journal of Entomology and Zoology Studies. 2017;5:388–94.

Schepaschenko D, Shvidenko A, Usoltsev VA, Lakyda P, Luo Y, Vasylyshyn R, et al. A dataset of forest biomass structure for Eurasia. Scientific Data. 2017;4:1–1.

For those who want to know: the value of \(r\) only varies between \(-1\) and \(1\), so the sampling distribution is not a normal distribution. Instead, a transformation of the correlation coefficient has an approximate normal distribution and standard error. Software automatically does this transforming.↩︎