1 Intro


1.1 Intended Learning Outcomes

After attending this lab, you should be able to use R to:

  • calculate least squares estimates of model parameters using vector-matrix formulation;
  • calculate and interpret the sample correlation coefficient;
  • perform hypothesis tests on the population correlation and interpret the decision.

1.2 Introduction

In the lectures we learned how to assess the strength of a linear relationship between random variables using the correlation coefficient. The population correlation is a measure of the magnitude of the strength of the relationship between two random variables X and Y, and is defined as

ρ(X,Y)=Cov(X,Y)Var(X)Var(Y),

and can be estimated by replacing each of Cov(X,Y), (X) and (Y) by their unbiased estimators to give

r=SxySxxSyy=ni=1(xi¯x)(yi¯y)ni=1(xi¯x)2(yi¯y)2,

the sample correlation coefficient (-1 r 1).

Given a sample of data, we can assess the statistical significance of the observed correlations between variables in the wider population. To do this we perform a hypothesis test (more on this in Chapter 2.2).