2.5 Random vectors and matrices

Let \(\boldsymbol{y}\) be a vector of \(n\) random variables \((y_1,y_2,\dots,y_n)\). Note that \(\boldsymbol{y}\) is always a column vector; we’d write \(\boldsymbol{y}'\) or \(\boldsymbol{y}^T\) to talk about a row vector.

Now, if we’re bothering to write these random variables together in one vector, we are presumably interested in how they behave when considered all together. That is, we want to talk about the joint distribution.

2.5.1 Joint moments for vectors and matrices

Joint expectation is defined the way you’d want it to be (as expectations usually are):

\[ E({\boldsymbol{y}}) = \begin{pmatrix} E(y_1)\\ E(y_2)\\ E(y_3)\\ \vdots\\ E(y_n)\\ \end{pmatrix}= \begin{pmatrix} \mu_1\\ \mu_2\\ \mu_3\\ \vdots \\ \mu_n \end{pmatrix}=\boldsymbol{\mu}\]

where \(\mu_i\) is the expected value of \(y_i\).

What about variance? We can talk about the variance of each of the individual RVs, but to know about their joint distribution, we also have to specify how they vary together – their covariance.

\[Var(\boldsymbol{y}) = \begin{pmatrix} \sigma_{11}&\sigma_{12}&\ldots&\sigma_{1n}\\ \sigma_{21}&\sigma_{22}&\ldots&\sigma_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ \sigma_{n1}&\sigma_{n2}&\ldots&\sigma_{nn} \end{pmatrix}\]

where \(\sigma_{ij}=\sigma_{ji} = Cov(y_i,y_j)\) and \(\sigma_{ii}=\sigma_{i}^2\).

This is called a variance-covariance matrix or just a covariance matrix. There is a lot of information in there!

In the regression equation, we specify the structure of the variance-covariance matrix of the vector \(\boldsymbol{\varepsilon}\). If \(\boldsymbol{\varepsilon} \sim N(\boldsymbol{0},\sigma^2 \boldsymbol{I}) ~iid\), that means

\[Var(\boldsymbol{\varepsilon}) = \begin{pmatrix} \sigma^2&0&\ldots&0\\ 0&\sigma^2&\ldots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\ldots&\sigma^2 \end{pmatrix}\]

Each \(\varepsilon_i\) has the same variance considered individually, and there’s no covariance between them.

Response moment: That last sentence should remind you of a couple of the conditions for linear regression. Which ones?