2.6 General matrix fax and trix
2.6.1 Math for variances
Here is a general fact about the variance of a random vector: Var(\boldsymbol{y})=E(\boldsymbol{y}-\boldsymbol{\mu})(\boldsymbol{y}-\boldsymbol{\mu})' That looks like a vector-y version of the moment definition we saw for scalars, Var(y)=E[(y-\mu)^2], hmm?
\boldsymbol{y}\boldsymbol{y}' is called an outer product of \boldsymbol{y} and \boldsymbol{y}. This is different than the inner product \boldsymbol{y}'\boldsymbol{y}. (What is the dimension of each of those objects?)
The outer product is: \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{pmatrix} \left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) = \begin{pmatrix}y_1^2&y_1y_2&\ldots&y_1y_n\\ y_2y_1&y_2^2&\ldots&y_2y_n\\ \vdots&\vdots&\ddots&\vdots\\ y_ny_1&y_ny_2&\ldots&y_n^2 \end{pmatrix}
The inner product is:
\left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n\\ \end{pmatrix} = \sum{y_i^2}
Hey, that second one looks familiar! The inner product is a sum of squares; the outer product is a matrix of all products of y_i and y_j. Double-check which objects are row or column vectors to be sure you’re doing the right one.
Look back at the regression equation: the vectors there are \boldsymbol{y}, \boldsymbol{X\beta}, and \boldsymbol{\varepsilon}. Each is of length n. We may want to use these vector ideas on those objects….
2.6.2 Handy linear algebra concepts
What about \boldsymbol{X} itself? That’s a matrix, so let’s look at some relevant definitions:
Column space of a matrix: The column space Col(\boldsymbol{X}) of \boldsymbol{X} is the span of its column vectors:
\boldsymbol{Xc} \in Col(\boldsymbol{X}) \;\forall\; \boldsymbol{c} \in \mathbb{R}^{k+1} It’s the space of “all vectors you could possibly get by multiplying \boldsymbol{X} and some vector.”
I like to use k to refer to the number of predictors in a regression model. Other sources prefer p. There’s no real difference, though.
Check yourself: if you have k predictors in your model, why are there k+1 columns in the \boldsymbol{X} matrix?
This is a subset of \mathbb{R}^{k+1}, where k+1 is the number of columns in your matrix.
We refer to it as k+1 because that matches up with the idea of having k predictor columns plus an intercept column.
Important note! The column space doesn’t necessarily include all of \mathbb{R}^{k+1}. For example, what if the whole last column of \boldsymbol{X} is 0s? Then it doesn’t actually matter what the last element of \boldsymbol{c} is – it’s always going to be multiplied by 0. So you’re operating in one fewer dimension than you thought!
Rank of a matrix: The rank of any matrix \boldsymbol{A} is the dimension of the row space of \boldsymbol{A} (the space spanned by the rows of \boldsymbol{A}), which equals the dimension of the column space of \boldsymbol{A} (the space spanned by the columns of \boldsymbol{A}).
The rank is the number of linearly independent columns of a matrix. (What does “linearly independent columns” mean in terms of a predictor matrix?)
Norm of a vector: For a vector \boldsymbol{x}\in\mathbb{R}^n, the Euclidean norm is the length of the vector:
||x|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}
This is what people probably mean if they just say “the norm,” but technically this is the L_2 or Euclidean norm. The fact that we are using this specific definition of the length of a vector underlies all the math we are about to do. Ask me about my research sometime!