19 Correlation Analysis
We again work with the father/son heights dataset collected by Pearson.
load("data/father_son.rda")
attach(father_son)
19.1 Scatterplot
We saw that a scatterplot is an appropriate plot for paired numerical data.
plot(father, son, pch = 16, xlab = "father's height", ylab = "son's height", asp = 1, cex = 0.5)
19.2 Sample correlations
We compute correlations of various types. They are all positive, in congruence with what is observed in the plot.
cor(father, son, method = "pearson")
[1] 0.5012473
cor(father, son, method = "spearman")
[1] 0.505671
cor(father, son, method = "kendall")
[1] 0.3526375
19.3 Correlations tests
Although it is pretty clear from the scatterplot that the heights of a father and his son are positively correlated (or more generally, monotonically associated), for pedagodical reasons we perform the corresponding tests. (Refer to the manual for details on how the p-values are computed.)
cor.test(father, son, method = "pearson")
Pearson's product-moment correlation
data: father and son
t = 19.002, df = 1076, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4551622 0.5446541
sample estimates:
cor
0.5012473
cor.test(father, son, method = "spearman")
Spearman's rank correlation rho
data: father and son
S = 103209762, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.505671
cor.test(father, son, method = "kendall")
Kendall's rank correlation tau
data: father and son
z = 17.161, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.3526375
19.4 Distance covariance (and test)
We also apply the distance covariance test. (The function returns the Monte Carlo permutation p-value based on R replicates.)
require(energy)
dcov.test(father, son, R = 1e3)
dCov independence test (permutation test)
data: index 1, replicates 1000
nV^2 = 742.69, p-value = 0.000999
sample estimates:
dCov
0.8300339