## 2.3 Joint distributions

Often we are working with more than one random variable. The joint distribution for discrete random variables is expressed as the generalized probability mass function:

$p(x,y) = P(X=x, Y=y)$

Read this as: “the probability that $$X$$ takes the value of $$x$$ and $$Y$$ takes the value of $$y$$.” This function of two arguments provides probabilities for every combination of values.

In general, a joint distribution for many (discrete) variables can be written as

$p(x_1,...,x_m) = P(X_1 = x_1,...,X_m = x_m)$

For two variables $$X$$ and $$Y$$, we can create a table showing all the values of the joint probabilities. Conventionally, the last column is the marginal probability of $$X$$ and the last row is the marginal probability of $$Y$$. Notice that the rows and columns of the joint probabilities add up to the marginals:

$P(X = x ) = \sum_{all\; y} P(X=x,Y=y)$

$P(Y = y ) = \sum_{all\; x} P(X=x,Y=y)$ Here’s a table showing joint and marginal probability values for two example discrete random variables, X and Y. Based on this table, what values can each of these variables take on, and with what probability?

Y=0 Y=1 p(X)
X=0 0.125 0.000 0.125
X=1 0.250 0.125 0.375
X=2 0.125 0.250 0.375
X=3 0.000 0.125 0.125
p(Y) 0.500 0.500 1.000

Now we previously developed some tools for efficiently describing how a random variable behaves, considered on its own – the mean and variance (moments) of the distribution. But what if we have two random variables that are somehow related to each other? What tools can we use to talk about their distributions?

### 2.3.1 Covariance and correlation: joint moments

The covariance between two random variables is a measure of the joint variability between the two. It is the expected product of the deviations from the means:

$E((X - \mu_X)(Y - \mu_Y))$

The “parameters are constants” thing is a nice side effect of frequentist philosophy. Because parameters (as opposed to estimates) have a fixed, true value, which is written on a mountain somewhere even though you will never see it, they can be treated as constants – like, a fixed number – not random variables. For example, the true slope coefficient $$\beta_1$$ is a number. You don’t know which number, but that’s not important here; you just keep calling it $$\beta_1$$.

The expected value of a constant is…itself (for example, the expected value of the number 3 is 3). The variance of a constant is 0, since constants don’t vary (which is why we call ’em “constants”).

Remember that $$\mu_X$$ is just shorthand for $$E(X)$$ and that parameters are constants. So, for example, $$E(\mu) = \mu$$ and $$E(\beta_1)= \beta_1$$.

Useful Fact: $$E(aX+bY) = aE(X) + bE(Y)$$. You can prove this some other time, but here’s the first line:

\begin{align*} E(aX+bY) &= \sum_{all\;x}\sum_{all\; y}(ax + by)P(X = x, Y=y)\\ \end{align*}

Covariance is often written in another form:

\begin{align*} Cov(X,Y) &= E[(X-\mu_X)(Y-\mu_Y)] \\ &= E(XY) - E(X\mu_Y) - E(Y\mu_X) + E(\mu_X\mu_Y)\\ &= E(XY) - \mu_Y E(X) - \mu_X E(Y) + \mu_X\mu_Y\\ &= E(XY) - \mu_Y\mu_X - \mu_X\mu_Y + \mu_X\mu_Y\\ &= E(XY) - \mu_X\mu_Y \end{align*}

What is the covariance of $$X$$ with itself?

So what about correlation? Well, that’s the standardized covariance, scaled according to the spread of each variable:

$Cor(X,Y) = \frac{Cov(X,Y)}{SD(X)SD(Y)}$

Response moment: Using this formula, what is the correlation of $$X$$ with itself? Why does that make sense?

What does positive covariance/correlation look like? Negative? Zero?