1 Intro


1.1 Intended Learning Outcomes

After attending this lab, you should be able to use R to:

  • calculate least squares estimates of model parameters using vector-matrix formulation;
  • calculate and interpret the sample correlation coefficient;
  • perform hypothesis tests on the population correlation and interpret the decision.

1.2 Introduction

In the lectures we learned how to assess the strength of a linear relationship between random variables using the correlation coefficient. The population correlation is a measure of the magnitude of the strength of the relationship between two random variables X and Y, and is defined as

\[\begin{equation} \rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}, \tag{1.1} \end{equation}\]

and can be estimated by replacing each of \(\text{Cov}(X,Y)\), (X) and (Y) by their unbiased estimators to give

\[\begin{equation} r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}=\frac{\sum^{n}_{i=1}(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum^{n}_{i=1}(x_i-\overline{x})^2(y_i-\overline{y})^2}}, \tag{1.2} \end{equation}\]

the sample correlation coefficient (-1 \(\le r \le\) 1).

Given a sample of data, we can assess the statistical significance of the observed correlations between variables in the wider population. To do this we perform a hypothesis test (more on this in Chapter 2.2).