Practical 2 - Understanding correlation and further exploring relationships
1 Intro
1.1 Intended Learning Outcomes
After attending this lab, you should be able to use R
to:
- calculate least squares estimates of model parameters using vector-matrix formulation;
- calculate and interpret the sample correlation coefficient;
- perform hypothesis tests on the population correlation and interpret the decision.
1.2 Introduction
In the lectures we learned how to assess the strength of a linear relationship between random variables using the correlation coefficient. The population correlation is a measure of the magnitude of the strength of the relationship between two random variables X and Y, and is defined as
ρ(X,Y)=Cov(X,Y)√Var(X)Var(Y),
and can be estimated by replacing each of Cov(X,Y), (X) and (Y) by their unbiased estimators to give
r=Sxy√SxxSyy=∑ni=1(xi−¯x)(yi−¯y)√∑ni=1(xi−¯x)2(yi−¯y)2,
the sample correlation coefficient (-1 ≤r≤ 1).
Given a sample of data, we can assess the statistical significance of the observed correlations between variables in the wider population. To do this we perform a hypothesis test (more on this in Chapter 2.2).