8  Joint Distributions and Correlation

8.1 Joint distributions

  • The joint distribution of random variables \(X\) and \(Y\) is a probability distribution on \((x, y)\) pairs.

Example 8.1 Roll a fair four-sided die twice. Let \(X\) be the sum of the two dice, and let \(Y\) be the larger of the two rolls (or the common value if both rolls are the same).

  1. Construct a two-way displaying the joint distribution of \(X\) and \(Y\).




  2. Construct a spinner representing the joint distribution of \(X\) and \(Y\).




  3. Starting with the two-way table, how could you obtain the marginal distribution of \(X\)? of \(Y\)?




  4. Starting with the marginal distribution of \(X\) and the marginal distribution of \(Y\), could you necessarily construct the two-way table of the joint distribution? Explain.




  • The joint distribution of two random variables summarizes the possible pairs of values and their relative likelihoods.
  • It is possible to obtain marginal distributions from a joint distribution.
  • In general, marginal distributions alone are not enough to determine a joint distribution. (The exception is when random variables are independent.)

Example 8.2 Regina and Cady plan to meet for lunch between noon and 1:00 but they are not sure of their arrival times. Let \(R\) be the random variable representing Regina’s arrival time (minutes after noon), and \(Y\) for Cady. Assume that they each arrive uniformly at random at a time between noon and 1:00, independently of each other. Let \(T = \min(R, Y)\) be the time at which the first person arrives, and \(W=|R-Y|\) be the amount of time the first person waits for the second to arrive.

  1. Use simulation to approximate the joint distribution of \(T\) and \(W\), and sketch a plot representing the joint distribution.




  2. Sketch the marginal distribution of \(T\).




  3. Is \(W\) the same random variable as \(T\)? Do \(W\) and \(T\) have the same marginal distribution?




  4. Do \(W\) and \(T\) have a positive association, negative association, or no association? Explain what that means.




  • Simulated values of a continuous random variable are usually plotted in a scatter plot, or in a two-dimensional histogram which groups the observed values into rectangular “bins” and plots densities or frequencies for each bin.
  • The joint distribution of two continuous random variables can be described by a probability density function, for which volumes under the surface determine probabilities. The “density” height is whatever it needs to be so that volumes under the surface represent appropriate probabilities.
  • Marginal distributions can be obtained from a joint distribution by “stacking”/“collapsing”/“aggregating” out the other variable.

8.2 Correlation

  • Quantities like long run average, variance, and standard deviation summarize characteristics of the marginal distribution of a single random variable.
  • Covariance and correlation summarize in a single number a characteristic of the joint distribution of two random variables, namely, the degree to which they “co-deviate from the their respective means”.
  • The covariance of random values \(X\) and \(Y\) is defined as the long run average of the product of the paired deviations from the respective means \[ \text{Covariance($X$, $Y$)} = \text{Average of} ((X - \text{Average of }X)(Y - \text{Average of }Y) \]
  • A positive covariance indicate an overall positive association: above average values of \(X\) tend to be associated with above average values of \(Y\)
  • A negative covariance indicates am overall negative association: above average values of \(X\) tend to be associated with below average values of \(Y\)
  • A covariance of zero indicates that the random variables are uncorrelated: there is no overall positive or negative association. But be careful: if \(X\) and \(Y\) are uncorrelated there can still be a relationship between \(X\) and \(Y\). We will see examples later that demonstrate that being uncorrelated does not necessarily imply that random variables are independent.

Example 8.3 Consider the probability space corresponding to two rolls of a fair four-sided die. Let

  • \(X\) be the sum of the two rolls
  • \(Y\) the larger of the two rolls
  • \(W\) the number of rolls equal to 4
  • \(Z\) the number of rolls equal to 1.

Without doing any calculations, determine if the covariance between each of the following pairs of variables is positive, negative, or zero. Explain your reasoning conceptually.

  1. \(X\) and \(Y\)




  2. \(X\) and \(W\)




  3. \(X\) and \(Z\)




  4. \(X\) and \(V\)




  5. \(W\) and \(Z\)




  • The numerical value of the covariance depends on the measurement units of both variables, so interpreting it can be difficult.
  • Covariance is a measure of joint association between two random variables that has many nice theoretical properties, but the correlation (coefficient) is often a more practical measure.
  • The correlation for two random variables is the covariance between the corresponding standardized random variables. \[ \text{Correlation}(X, Y) = \text{Covariance}\left(\frac{X- \text{Average of }X}{\text{SD of }X}, \frac{Y- \text{Average of }Y}{\text{SD of }Y}\right) \]
  • When standardizing, subtracting the means doesn’t change the scale of the possible pairs of values; it merely shifts the center of the joint distribution. Therefore, correlation is the covariance divided by the product of the standard deviations. \[ \text{Correlation}(X, Y) = \frac{\text{Covariance}(X, Y)}{(\text{SD of }X)(\text{SD of }Y)} \]
  • A correlation coefficient has no units and is measured on a universal scale. Regardless of the original measurement units of the random variables \(X\) and \(Y\) \[ -1\le \textrm{Correlation}(X,Y)\le 1 \]
  • \(\textrm{Correlation}(X,Y) = 1\) if and only if \(Y=aX+b\) for some \(a>0\)
  • \(\textrm{Correlation}(X,Y) = -1\) if and only if \(Y=aX+b\) for some \(a<0\)
  • Therefore, correlation is a standardized measure of the strength of the linear association between two random variables.
  • The closer the correlation is to 1 or \(-1\), the closer the joint distribution of \((X, Y)\) pairs hugs a straight line, with positive or negative slope.
  • Because correlation is computed between standardized random variables, correlation is not affected by a linear rescaling of either variable (e.g., a change in measurement units from minutes to seconds)

Example 8.4 Try the Guessing Correlations Applet!.






8.3 Joint Normal distributions

Example 8.5 In the meeting problem, suppose that arrival times \(R\) and \(Y\) each follow a marginal Normal(30, 10) distribution.

  1. Assume that the two people arrive independently of each other. Explain how you can use the standard Normal spinner to simulate an \((R, Y)\) pair of arrival times. Simulate many \((R, Y)\) pairs and sketch a plot of the joint distribution.




  2. Now suppose that the two people have coordinated their meeting time so that they tend to arrive around the same time. In particular, suppose that the \((R, Y)\) pair follows a Bivariate Normal distribution with correlation 0.7 (and still with marginal mean 30 and standard deviation 10 for each person). Simulate many \((R, Y)\) pairs and sketch a plot of the joint distribution.




  3. Consider the waiting time random variable \(W=|R-Y|\); in particular, consider \(\text{P}(W < 15)\) and \(\text{SD}(W)\). Comparing the independent arrival times case in part 1 and the positively correlated arrival times in part 2, in which case is \(\text{P}(W < 15)\) larger? What about \(\text{SD}(W)\)? Explain conceptually, and then use simulation to approximate the distribution of \(W\) and \(\text{P}(W < 15)\) and \(\text{SD}(W)\) in each of the two cases.




  • Just as Normal distributions are commonly assumed for marginal distributions of individual random variables, joint Normal distributions are often assumed for joint distributions of several random variables.
  • A “Bivariate Normal” distribution is a joint distribution for two random variables which has five parameters: the two means, the two standard deviations, and the correlation
  • A marginal Normal distribution is a “bell-shaped curve”; a Bivariate Normal distribution is a “mound-shaped surface” — imagine a pile of sand.