## 2.6 General matrix fax and trix

### 2.6.1 Math for variances

Here is a general fact about the variance of a random vector: $Var(\boldsymbol{y})=E(\boldsymbol{y}-\boldsymbol{\mu})(\boldsymbol{y}-\boldsymbol{\mu})'$ That looks like a vector-y version of the moment definition we saw for scalars, $$Var(y)=E[(y-\mu)^2]$$, hmm?

$$\boldsymbol{y}\boldsymbol{y}'$$ is called an outer product of $$\boldsymbol{y}$$ and $$\boldsymbol{y}$$. This is different than the inner product $$\boldsymbol{y}'\boldsymbol{y}$$. (What is the dimension of each of those objects?)

The outer product is: $\begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{pmatrix} \left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) = \begin{pmatrix}y_1^2&y_1y_2&\ldots&y_1y_n\\ y_2y_1&y_2^2&\ldots&y_2y_n\\ \vdots&\vdots&\ddots&\vdots\\ y_ny_1&y_ny_2&\ldots&y_n^2 \end{pmatrix}$

The inner product is:

$\left(\begin{array}{cccc} y_1&y_2&\ldots&y_n \end{array}\right) \begin{pmatrix} y_1\\ y_2\\ y_3\\ \vdots\\ y_n\\ \end{pmatrix} = \sum{y_i^2}$

Hey, that second one looks familiar! The inner product is a sum of squares; the outer product is a matrix of all products of $$y_i$$ and $$y_j$$. Double-check which objects are row or column vectors to be sure you’re doing the right one.

Look back at the regression equation: the vectors there are $$\boldsymbol{y}$$, $$\boldsymbol{X\beta}$$, and $$\boldsymbol{\varepsilon}$$. Each is of length $$n$$. We may want to use these vector ideas on those objects….

### 2.6.2 Handy linear algebra concepts

What about $$\boldsymbol{X}$$ itself? That’s a matrix, so let’s look at some relevant definitions:

Column space of a matrix: The column space $$Col(\boldsymbol{X})$$ of $$\boldsymbol{X}$$ is the span of its column vectors:

$\boldsymbol{Xc} \in Col(\boldsymbol{X}) \;\forall\; \boldsymbol{c} \in \mathbb{R}^{k+1}$ It’s the space of “all vectors you could possibly get by multiplying $$\boldsymbol{X}$$ and some vector.”

I like to use $$k$$ to refer to the number of predictors in a regression model. Other sources prefer $$p$$. There’s no real difference, though.

Check yourself: if you have $$k$$ predictors in your model, why are there $$k+1$$ columns in the $$\boldsymbol{X}$$ matrix?

This is a subset of $$\mathbb{R}^{k+1}$$, where $$k+1$$ is the number of columns in your matrix.

We refer to it as $$k+1$$ because that matches up with the idea of having $$k$$ predictor columns plus an intercept column.

Important note! The column space doesn’t necessarily include all of $$\mathbb{R}^{k+1}$$. For example, what if the whole last column of $$\boldsymbol{X}$$ is 0s? Then it doesn’t actually matter what the last element of $$\boldsymbol{c}$$ is – it’s always going to be multiplied by 0. So you’re operating in one fewer dimension than you thought!

Rank of a matrix: The rank of any matrix $$\boldsymbol{A}$$ is the dimension of the row space of $$\boldsymbol{A}$$ (the space spanned by the rows of $$\boldsymbol{A}$$), which equals the dimension of the column space of $$\boldsymbol{A}$$ (the space spanned by the columns of $$\boldsymbol{A}$$).

The rank is the number of linearly independent columns of a matrix. (What does “linearly independent columns” mean in terms of a predictor matrix?)

Norm of a vector: For a vector $$\boldsymbol{x}\in\mathbb{R}^n$$, the Euclidean norm is the length of the vector:
$||x|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}$

This is what people probably mean if they just say “the norm,” but technically this is the $$L_2$$ or Euclidean norm. The fact that we are using this specific definition of the length of a vector underlies all the math we are about to do. Ask me about my research sometime!