1.18 General matrix fax and trix
1.18.1 Math for variances
Here is a general fact about the variance of a random vector: \[Var(\boldsymbol{y})=E(\boldsymbol{y}-\boldsymbol{\mu})(\boldsymbol{y}-\boldsymbol{\mu})'\] That looks like a vector-y version of the moment definition we saw for scalars, \(Var(y)=E[(y-\mu)^2]\), hmm?
\(\boldsymbol{y}\boldsymbol{y}'\) is called an outer product of \(\boldsymbol{y}\) and \(\boldsymbol{y}\). This is different than the inner product \(\boldsymbol{y}'\boldsymbol{y}\). (What is the dimension of each of those objects?)
The outer product is: \[ \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{pmatrix} \left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) = \begin{pmatrix}y_1^2&y_1y_2&\ldots&y_1y_n\\ y_2y_1&y_2^2&\ldots&y_2y_n\\ \vdots&\vdots&\ddots&\vdots\\ y_ny_1&y_ny_2&\ldots&y_n^2 \end{pmatrix} \]
The inner product is:
\[ \left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n\\ \end{pmatrix} = \sum{y_i^2} \]
Hey, that second one looks familiar! The inner product is a sum of squares; the outer product is a matrix of all products of \(y_i\) and \(y_j\). Double-check which objects are row or column vectors to be sure you’re doing the right one.
Look back at the regression equation: the vectors there are \(\boldsymbol{y}\), \(\boldsymbol{X\beta}\), and \(\boldsymbol{\varepsilon}\). Each is of length \(n\). We may want to use these vector ideas on those objects….
1.18.2 Handy linear algebra concepts
What about \(\boldsymbol{X}\) itself? That’s a matrix, so let’s look at some relevant definitions:
Column space of a matrix: The column space \(Col(\boldsymbol{X})\) of \(\boldsymbol{X}\) is the span of its column vectors:
\[\boldsymbol{Xc} \in Col(\boldsymbol{X}) \;\forall\; \boldsymbol{c} \in \mathbb{R}^{k+1}\] It’s the space of “all vectors you could possibly get by multiplying \(\boldsymbol{X}\) and some vector.”
I like to use \(k\) to refer to the number of predictors in a regression model. Other sources prefer \(p\). There’s no real difference, though.
Check yourself: if you have \(k\) predictors in your model, why are there \(k+1\) columns in the \(\boldsymbol{X}\) matrix?
This is a subset of \(\mathbb{R}^{k+1}\), where \(k+1\) is the number of columns in your matrix.
We refer to it as \(k+1\) because that matches up with the idea of having \(k\) predictor columns plus an intercept column.
Important note! The column space doesn’t necessarily include all of \(\mathbb{R}^{k+1}\). For example, what if the whole last column of \(\boldsymbol{X}\) is 0s? Then it doesn’t actually matter what the last element of \(\boldsymbol{c}\) is – it’s always going to be multiplied by 0. So you’re operating in one fewer dimension than you thought!
Rank of a matrix: The rank of any matrix \(\boldsymbol{A}\) is the dimension of the row space of \(\boldsymbol{A}\) (the space spanned by the rows of \(\boldsymbol{A}\)), which equals the dimension of the column space of \(\boldsymbol{A}\) (the space spanned by the columns of \(\boldsymbol{A}\)).
The rank is the number of linearly independent columns of a matrix. (What does “linearly independent columns” mean in terms of a predictor matrix?)
Norm of a vector: For a vector \(\boldsymbol{x}\in\mathbb{R}^n\), the Euclidean norm is the length of the vector:
\[||x|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}\]
This is what people probably mean if they just say “the norm,” but technically this is the \(L_2\) or Euclidean norm. The fact that we are using this specific definition of the length of a vector underlies all the math we are about to do. Ask me about my research sometime!