2.6 General matrix fax and trix

2.6.1 Math for variances

Here is a general fact about the variance of a random vector: Var(\boldsymbol{y})=E(\boldsymbol{y}-\boldsymbol{\mu})(\boldsymbol{y}-\boldsymbol{\mu})' That looks like a vector-y version of the moment definition we saw for scalars, Var(y)=E[(y-\mu)^2], hmm?

\boldsymbol{y}\boldsymbol{y}' is called an outer product of \boldsymbol{y} and \boldsymbol{y}. This is different than the inner product \boldsymbol{y}'\boldsymbol{y}. (What is the dimension of each of those objects?)

The outer product is: \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{pmatrix} \left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) = \begin{pmatrix}y_1^2&y_1y_2&\ldots&y_1y_n\\ y_2y_1&y_2^2&\ldots&y_2y_n\\ \vdots&\vdots&\ddots&\vdots\\ y_ny_1&y_ny_2&\ldots&y_n^2 \end{pmatrix}

The inner product is:

\left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n\\ \end{pmatrix} = \sum{y_i^2}

Hey, that second one looks familiar! The inner product is a sum of squares; the outer product is a matrix of all products of y_i and y_j. Double-check which objects are row or column vectors to be sure you’re doing the right one.

Look back at the regression equation: the vectors there are \boldsymbol{y}, \boldsymbol{X\beta}, and \boldsymbol{\varepsilon}. Each is of length n. We may want to use these vector ideas on those objects….

2.6.2 Handy linear algebra concepts

What about \boldsymbol{X} itself? That’s a matrix, so let’s look at some relevant definitions:

Column space of a matrix: The column space Col(\boldsymbol{X}) of \boldsymbol{X} is the span of its column vectors:

\boldsymbol{Xc} \in Col(\boldsymbol{X}) \;\forall\; \boldsymbol{c} \in \mathbb{R}^{k+1} It’s the space of “all vectors you could possibly get by multiplying \boldsymbol{X} and some vector.”

I like to use k to refer to the number of predictors in a regression model. Other sources prefer p. There’s no real difference, though.

Check yourself: if you have k predictors in your model, why are there k+1 columns in the \boldsymbol{X} matrix?

This is a subset of \mathbb{R}^{k+1}, where k+1 is the number of columns in your matrix.

We refer to it as k+1 because that matches up with the idea of having k predictor columns plus an intercept column.

Important note! The column space doesn’t necessarily include all of \mathbb{R}^{k+1}. For example, what if the whole last column of \boldsymbol{X} is 0s? Then it doesn’t actually matter what the last element of \boldsymbol{c} is – it’s always going to be multiplied by 0. So you’re operating in one fewer dimension than you thought!

Rank of a matrix: The rank of any matrix \boldsymbol{A} is the dimension of the row space of \boldsymbol{A} (the space spanned by the rows of \boldsymbol{A}), which equals the dimension of the column space of \boldsymbol{A} (the space spanned by the columns of \boldsymbol{A}).

The rank is the number of linearly independent columns of a matrix. (What does “linearly independent columns” mean in terms of a predictor matrix?)

Norm of a vector: For a vector \boldsymbol{x}\in\mathbb{R}^n, the Euclidean norm is the length of the vector:
||x|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}

This is what people probably mean if they just say “the norm,” but technically this is the L_2 or Euclidean norm. The fact that we are using this specific definition of the length of a vector underlies all the math we are about to do. Ask me about my research sometime!