2.6 General matrix fax and trix

2.6.1 Math for variances

Here is a general fact about the variance of a random vector: $Var(\boldsymbol{y})=E(\boldsymbol{y}-\boldsymbol{\mu})(\boldsymbol{y}-\boldsymbol{\mu})'$ That looks like a vector-y version of the moment definition we saw for scalars, $Var(y)=E[(y-\mu)^2]$ , hmm?

$\boldsymbol{y}\boldsymbol{y}'$ is called an outer product of $\boldsymbol{y}$ and $\boldsymbol{y}$ . This is different than the inner product $\boldsymbol{y}'\boldsymbol{y}$ . (What is the dimension of each of those objects?)

The outer product is: $\begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{pmatrix} \left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) = \begin{pmatrix}y_1^2&y_1y_2&\ldots&y_1y_n\\ y_2y_1&y_2^2&\ldots&y_2y_n\\ \vdots&\vdots&\ddots&\vdots\\ y_ny_1&y_ny_2&\ldots&y_n^2 \end{pmatrix}$

The inner product is:

$\left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n\\ \end{pmatrix} = \sum{y_i^2}$

Hey, that second one looks familiar! The inner product is a sum of squares; the outer product is a matrix of all products of $y_i$ and $y_j$ . Double-check which objects are row or column vectors to be sure you’re doing the right one.

Look back at the regression equation: the vectors there are $\boldsymbol{y}$ , $\boldsymbol{X\beta}$ , and $\boldsymbol{\varepsilon}$ . Each is of length $n$ . We may want to use these vector ideas on those objects….

2.6.2 Handy linear algebra concepts

What about $\boldsymbol{X}$ itself? That’s a matrix, so let’s look at some relevant definitions:

Column space of a matrix: The column space $Col(\boldsymbol{X})$ of $\boldsymbol{X}$ is the span of its column vectors:

$\boldsymbol{Xc} \in Col(\boldsymbol{X}) \;\forall\; \boldsymbol{c} \in \mathbb{R}^{k+1}$ It’s the space of “all vectors you could possibly get by multiplying $\boldsymbol{X}$ and some vector.”

I like to use $k$ to refer to the number of predictors in a regression model. Other sources prefer $p$ . There’s no real difference, though.

Check yourself: if you have $k$ predictors in your model, why are there $k+1$ columns in the $\boldsymbol{X}$ matrix?

This is a subset of $\mathbb{R}^{k+1}$ , where $k+1$ is the number of columns in your matrix.

We refer to it as $k+1$ because that matches up with the idea of having $k$ predictor columns plus an intercept column.

Important note! The column space doesn’t necessarily include all of $\mathbb{R}^{k+1}$ . For example, what if the whole last column of $\boldsymbol{X}$ is 0s? Then it doesn’t actually matter what the last element of $\boldsymbol{c}$ is – it’s always going to be multiplied by 0. So you’re operating in one fewer dimension than you thought!

Rank of a matrix: The rank of any matrix $\boldsymbol{A}$ is the dimension of the row space of $\boldsymbol{A}$ (the space spanned by the rows of $\boldsymbol{A}$ ), which equals the dimension of the column space of $\boldsymbol{A}$ (the space spanned by the columns of $\boldsymbol{A}$ ).

The rank is the number of linearly independent columns of a matrix. (What does “linearly independent columns” mean in terms of a predictor matrix?)

Norm of a vector: For a vector $\boldsymbol{x}\in\mathbb{R}^n$ , the Euclidean norm is the length of the vector:
$||x|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}$

This is what people probably mean if they just say “the norm,” but technically this is the $L_2$ or Euclidean norm. The fact that we are using this specific definition of the length of a vector underlies all the math we are about to do. Ask me about my research sometime!