Typical scatterplots

Before discussing specific examples let us look at the correlations of some scatterplots between two variables.

  • Plot A shows a perfect positive correlation

  • plot B shows a perfect negative correlation

  • Plot C shows an extremely strong positive correlation

  • Plot D shows an extremely strong negative correlation.

  • Plot E shows a correlation coefficients that is very close to zero

  • Plot F shows a correlation coefficients that is very close to zero

  • Plot G shows a correlation coefficients that is very close to zero

  • Plot H shows a correlation coefficients that is very close to zero

Realistically, you never expect to see a correlation that is actually perfect in real data. A correlation of zero merely indicates that there is no linear association. This shows the weakness of solely using correlation to assess the relationship between two variables, and demonstrates why it is important to use scatterplots. While plot E contains points that are randomly spread, plot F shows a strong quadratic relationship; the correlation coefficient only measures linear correlation, so will not show quadratic relationships. Likewise, plots G and H show the relationship between two variables when one of the variables is nearly constant, in both cases, the correlation coefficient will have a value that is close to \(0\). Intuitively this makes sense, as the value of the non-constant variable has no bearing on the nearly constant variable. Note that if the nearly constant variable was actually constant then the correlation coefficient would be undefined, the reason for this can be seen by examining the formula for the correlation coefficient. If all \(x_i\)’s are the same value, then all \(x_i = \bar{x}\), which would mean that \(S_{xy} = 0\) and \(S_{xx} = 0\); so \(r\) would be undefined.