5.4 Covariance and correlation

Quantities like expected value and variance summarize characteristics of the marginal distribution of a single random variable. When there are multiple random variables their joint distribution is of interest. Covariance summarizes in a single number a characteristic of the joint distribution of two random variables, namely, the degree to which they “co-deviate from the their respective means”.

Recall Section 2.12 where we saw that covariance of random variables \(X\) and \(Y\) is the long run average of the product of the paired deviations from the respective means

\[ \text{Covariance($X$, $Y$)} = \text{Average of} ((X - \text{Average of }X)(Y - \text{Average of }Y)) \] It turns out that covariance is equivalent to the average of the product minus the product of the averages.

\[ \text{Covariance($X$, $Y$)} = \text{Average of} XY - (\text{Average of }X)(\text{Average of }Y) \]

Translating the long run averages into expected values yields the following definition.

Definition 5.5 The covariance between two random variables \(X\) and \(Y\) is \[\begin{align*} \textrm{Cov}(X,Y) & = \textrm{E}\left[\left(X-\textrm{E}[X]\right)\left(Y-\textrm{E}[Y]\right)\right]\\ & = \textrm{E}(XY) - \textrm{E}(X)\textrm{E}(Y) \end{align*}\]

The first line above defines covariance as the long run average product of paired deviations from the mean. The second provides an equivalent formula that simplifies computation. Namely, covariance is the expected value of the product minus the product of expected values. (Remember that in general \(\textrm{E}(XY)\) is not equal to \(\textrm{E}(X)\textrm{E}(Y)\).)

Example 5.21 Consider the probability space corresponding to two rolls of a fair four-sided die. Let \(X\) be the sum of the two rolls, \(Y\) the larger of the two rolls, \(W\) the number of rolls equal to 4, and \(Z\) the number of rolls equal to 1.

  1. Specify the joint pmf of \(X\) and \(Y\). (These first few parts are review of things we have covered before.)
  2. Find the marginal pmf of \(X\) and \(\textrm{E}(X)\).
  3. Find the marginal pmf of \(Y\) and \(\textrm{E}(Y)\).
  4. Find \(\textrm{E}(XY)\). Is it equal to \(\textrm{E}(X)\textrm{E}(Y)\)?
  5. Find \(\textrm{Cov}(X, Y)\). Why is the covariance positive?
  6. Is \(\textrm{Cov}(X, W)\) positive, negative, or zero? Why?
  7. Is \(\textrm{Cov}(X, Z)\) positive, negative, or zero? Why?
  8. Let \(V=W+Z\). Is \(\textrm{Cov}(X, V)\) positive, negative, or zero? Why?
  9. Is \(\textrm{Cov}(W, Z)\) positive, negative, or zero? Why?

Solution. to Example 5.21

Show/hide solution
  1. See Example 2.23 for the joint and marginal distributions.
  2. \(\textrm{E}(X)= 5\).
  3. \(\textrm{E}(Y) = 1(1/16) + 2(3/16) + 3(5/16) + 4(7/16) = 3.125\).
  4. \(\textrm{E}(XY)=(2)(1)(1/16) + (3)(2)(2/16) + \cdots (8)(4)(1/16) = 16.875\). This is not equal to \(\textrm{E}(X)\textrm{E}(Y) = (5)(3.125) = 15.625.\)
  5. \(\textrm{Cov}(X, Y) = \textrm{E}(XY) - \textrm{E}(X)\textrm{E}(Y) = 16.875 - 15.625 = 1.25.\) There is an overall positive association; above average values of \(X\) tend to be associated with above average values of \(Y\) (e.g., (7, 4), (8, 4)), and below average values of \(X\) tend to be associated with below average values of \(Y\) (e.g., (2, 1), (3, 2)).
  6. \(\textrm{Cov}(X, W)>0\). If \(W\) is large (roll many 4s) then \(X\) (sum) tends to be large.
  7. \(\textrm{Cov}(X, Z)<0\). If \(Z\) is large (roll many 1s) then \(X\) (sum) tends to be small.
  8. \(\textrm{Cov}(X, W + Z)=0\). Basically, the positive association between \(X\) and \(W\) cancels out with the negative association of \(X\) and \(Z\). \(V\) is large when there are many 1s or many 4s, or some mixture of 1s and 4s. So knowing that W is large doesn’t really tell you anything about the sum.
  9. \(\textrm{Cov}(W, Z)<0\). There is a fixed number of rolls. If you roll lots of 4s (\(W\) is large) then there must be few rolls of 1s (\(Z\) is small).

The sign of the covariance indicates the overall direction of the association between \(X\) and \(Y\).

  • \(\textrm{Cov}(X,Y)>0\) (positive association): above average values of \(X\) tend to be associated with above average values of \(Y\)
  • \(\textrm{Cov}(X,Y)<0\) (negative association): above average values of \(X\) tend to be associated with below average values of \(Y\)
  • \(\textrm{Cov}(X,Y)=0\) indicates that the random variables are uncorrelated: there is no overall positive or negative association. But be careful: if \(X\) and \(Y\) are uncorrelated there can still be a relationship between \(X\) and \(Y\); there is just no overall positive or negative association.. We will see examples that demonstrate that being uncorrelated does not necessarily imply that random variables are independent.

Example 5.22 In Example 2.57, how could we use simulation to approximate \(\textrm{Cov}(X, Y)\)? Donny Dont’s Symbulate code is below. Explain to Donny in words the correct simulation process, and why his code does not reflect that process. What is the correct code?

P = DiscreteUniform(1, 4) ** 2
X = RV(P, sum)
Y = RV(P, max)

x = X.sim(10000)
y = Y.sim(10000)

(x * y).mean() - x.mean() * y.mean()

Solution. to Example 5.22

Show/hide solution

Donny’s code attempts to simulate values of \(X\) and values of \(Y\) separately, from each of their marginal distributions. However, to approximate the covariance we need to simulate \((X, Y)\) pairs from the joint distribution. See the code below. (Otherwise, Donny’s code is fine.)

Since \(\textrm{Cov}(X, Y)\) is based on expected values, it can be approximated by simulating appropriate long run averages.

  • Simulate an \((X, Y)\) pair from the joint distribution.
  • Find the value of the product \(XY\) for the simulated pair.
  • Repeat many times, simulating many \((X, Y)\) pairs and finding their product \(XY\).
  • Average the simulated values of the product \(XY\) to approximate \(\textrm{E}(XY)\).
  • Average the simulated values of \(X\) to approximate \(\textrm{E}(X)\).
  • Average the simulated values of \(Y\) to approximate \(\textrm{E}(Y)\).
  • \(\textrm{Cov}(X, Y)\) is approximately the average of the product minus the product of the averages.

The following code illustrates how to approximate covariance via simulation in the context of the previous example. The key is to first simulate \((X, Y)\) pairs with the proper joint distribution.

P = DiscreteUniform(1, 4) ** 2
X = RV(P, sum)
Y = RV(P, max)

xy = (X & Y).sim(10000)

xy
Index Result
0(3, 2)
1(2, 1)
2(6, 3)
3(6, 4)
4(6, 4)
5(4, 3)
6(4, 3)
7(7, 4)
8(7, 4)
......
9999(4, 3)
plt.figure()
xy.plot('tile')
plt.show()

The simulated \((X, Y)\) pairs are stored as xy, a matrix with two columns (one for \(X\) and one for \(Y\)) and a row for each repetition of the simulation. The simulated \(X\) values in the first column of xy can be extracted with x = xy[0]. (Remember Python’s zero-based indexing.) The simulated \(Y\) values in the second column of xy can be extracted with y = xy[1]. The product x * y will multiply \(XY\) for each simulated \((X, Y)\) pair, row by row.


x = xy[0]
y = xy[1]

x * y
Index Result
06
12
218
324
424
512
612
728
828
......
999912

We can then approximate the covariance with the average of the product minus the product of the averages.


(x * y).mean() - x.mean() * y.mean()
## 1.2525323400000001

The .cov() command performs the above calculation.


xy.cov()
## 1.2525323399999988

Example 5.23 Recall Example ??. Spin the Uniform(1, 4) spinner twice and let \(X\) be the sum and \(Y\) the larger of the two spins. Sketch a plot of the joint distribution; is \(\textrm{Cov}(X, Y)\) positive, negative, or zero? Then use simulation to approximate the covariance.

Show/hide solution

Covariance is positive as above average values of \(X\) tend to be associated with above average values of \(Y\). Simulation show the covariance is about 0.75.

P = Uniform(1, 4) ** 2
X = RV(P, sum)
Y = RV(P, max)

xy = (X & Y).sim(10000)

plt.figure()
xy.plot('density')
plt.show()

xy.cov()
## 0.7417211569275983

For two continuous random variables, \(\textrm{E}(XY)\) involves a double integral over possible \((x, y)\) pairs of the product \(xy\) weighted by the joint density. We will see later some strategies for computing \(\textrm{E}(XY)\) and \(\textrm{Cov}(X, Y)\) that avoid double integration.

Example 5.24 What is another name for \(\textrm{Cov}(X, X)\)?

Show/hide solution

The variance of \(X\) is the covariance of \(X\) with itself. \[ \textrm{Cov}(X, X) = \textrm{E}\left((X-\textrm{E}(X))(X-\textrm{E}(X))\right) = \textrm{E}\left((X-\textrm{E}(X))^2\right) = \textrm{Var}(X) \]

5.4.1 Correlation

Example 5.25 The covariance between height in inches and weight in pounds for football players is 96.

  1. What are the measurement units of the covariance?
  2. Suppose height were measured in feet instead of inches. Would the shape of the joint distribution change? Would the strength of the association between height and weight change? Would the value of covariance change?
Show/hide solution
  1. Covariance deals with products, so the covariance is 96 inches\(\times\)pounds.
  2. A linear rescaling does not change the shape of the distribution or the strength of the association. However, a linear rescaling does relabel the axis and change measurement units. A covariance of 96 inches\(\times\)pounds corresponds to a covariance of 8 feet\(\times\)pounds.

The numerical value of the covariance depends on the measurement units of both variables, so interpreting it can be difficult. Covariance is a measure of joint association between two random variables that has many nice theoretical properties, but the correlation coefficient is often a more practical measure. (We saw a similar idea with variance and standard deviation. Variance has many nice theoretical properties. However, standard deviation is often a better practical measure of variability.)

Definition 5.6 The correlation (coefficient) between random variables \(X\) and \(Y\) is

\[\begin{align*} \textrm{Corr}(X,Y) & = \textrm{Cov}\left(\frac{X-\textrm{E}(X)}{\textrm{SD}(X)},\frac{Y-\textrm{E}(Y)}{\textrm{SD}(Y)}\right)\\ & = \frac{\textrm{Cov}(X, Y)}{\textrm{SD}(X)\textrm{SD}(Y)} \end{align*}\]

The correlation for two random variables is the covariance between the corresponding standardized random variables. Therefore, correlation is a standardized measure of the association between two random variables.

Subtracting the means doesn’t change the scale of the possible pairs of values; it merely shifts the center of the joint distribution. Therefore, correlation is the covariance divided by the product of the standard deviations.

A correlation coefficient has no units and is measured on a universal scale. Regardless of the original measurement units of the random variables \(X\) and \(Y\) \[ -1\le \textrm{Corr}(X,Y)\le 1 \]

  • \(\textrm{Corr}(X,Y) = 1\) if and only if \(Y=aX+b\) for some \(a>0\)
  • \(\textrm{Corr}(X,Y) = -1\) if and only if \(Y=aX+b\) for some \(a<0\)

Therefore, correlation is a standardized measure of the strength of the linear association between two random variables.

The following plots display a few Bivariate Normal distributions with different correlations.

Bivariate Normal distributions

Figure 5.5: Bivariate Normal distributions

Example 5.26 The covariance between height in inches and weight in pounds for football players is 96. Heights have mean 74 and SD 3 inches. Weights have mean 250 pounds and SD 45 pounds. Find the correlation between weight and height.

Show/hide solution

\[ \textrm{Corr}(X, Y) = \frac{\textrm{Cov}(X, Y)}{\textrm{SD}(X)\textrm{SD}(Y)} = \frac{96}{(3)(45)} = 0.71 \]

(The means are useful to have, but they don’t effect the correlation.)

Because correlation is computed between standardized random variables, correlation is not affected by a linear rescaling of either variable. One standard deviation above the mean is one standard deviation above the mean, whether that’s measured in feet or inches or meters.

In many problems we are given the correlation directly and need to find the covariance. Covariance is the correlation times the product of the standard deviations.

\[ \textrm{Cov}(X, Y) = \textrm{Corr}(X, Y)\textrm{SD}(X)\textrm{SD}(Y) \]

5.4.2 Independent versus uncorrelated

The condition for independence of random variables is fairly strict. Random variables are independent only if the joint distribution is determined by the product of the marginal distributions for all values of the random variables. Otherwise, the random variables are not independent. However, you can imagine that there are different degrees of dependence. For example, the mosaic plot on the right in Figure 4.43 displays a situation where \(X\) and \(Y\) are “not too dependent”.

We wish to measure degree of dependence in a single number. Unfortunately, there are many different ways two random variables can be related, and there is no single measure that works well in all cases.

The most commonly used measure of the degree of an association between two random variables is covariance, or its standardized counterpart, correlation. While covariance does have many uses and nice properties, it does NOT measure whether two random variables are independent.

Example 5.27 Find \(\textrm{Cov}(X,Y)\) in each of the following situations. Notice that the marginal distribution of \(X\) is the same for each part, and similarly for \(Y\).

  1. \(p_{X, Y}(x, y)\)
    \(x\) \ \(y\) -1 0 1 \(p_X(x)\)
    0 1/6 1/6 1/6 1/2
    1 1/6 1/6 1/6 1/2
    \(p_Y(y)\) 1/3 1/3 1/3
  2. \(p_{X, Y}(x, y)\)
    \(x\) \ \(y\) -1 0 1 \(p_X(x)\)
    0 5/24 2/24 5/24 1/2
    1 3/24 6/24 3/24 1/2
    \(p_Y(y)\) 1/3 1/3 1/3
  3. In which of the previous parts are \(X\) and \(Y\) independent? Why?

Solution. to Example 5.27

Show/hide solution
  1. \(\textrm{Cov}(X, Y) = \textrm{E}(XY) - \textrm{E}(X)\textrm{E}(Y) = \textrm{E}(XY)\) since \(\textrm{E}(Y)=0\). \(\textrm{E}(XY) = 0 + (1)(1)(1/6) + (1)(-1)(1/6) = 0\). So the covariance is 0.
  2. Again, \(\textrm{Cov}(X, Y) = \textrm{E}(XY)\) since \(\textrm{E}(Y)=0\). \(\textrm{E}(XY) = 0 + (1)(1)(3/24) + (1)(-1)(3/24) = 0\). So the covariance is 0.
  3. \(X\) and \(Y\) are independent in part 1 because the joint pmf is the product of the marginal pmfs. However, \(X\) and \(Y\) in part 2 are not independent even though the covariance is 0.

If \(X\) and \(Y\) are independent then \(X\) and \(Y\) are uncorrelated, that is, \(\textrm{Cov}(X, Y) = 0\). However, the converse is not true in general: there are many situations in which \(\textrm{Cov}(X, Y) = 0\) but \(X\) and \(Y\) are not independent. Independence means there is no probabilistic relationship between two random variables. Covariance only measures a particular kind of probabilistic relationship, namely: how closely do the \((X, Y)\) pairs tend to follow a single straight line with non-zero slope?