2.4 Matrix form of the regression equation

Let’s start by considering something familiar: the simple linear regression equation for a single data point (we’ll call it point $i$ : the $i$ th data point in our dataset).

$y_i = \beta_0 + \beta_1 x_{i} + \varepsilon_i$ (Check yourself: is this the true regression equation or the estimated/fitted one?)

Using just one predictor can get pretty boring, so we can extend this to the multiple linear regression equation! Again, this is for a single data point $i$ ; now we have $k$ different predictors ( $x$ variables) instead of just one.

$y_i = \beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik} + \varepsilon_i$

So, what if we want to talk about all the other data points? We could write down this equation over and over again, with the new values for $y_i$ , $\varepsilon_i$ , and all the $x_{ij}$ ’s each time. But we can save a lot of time and space (and general grief) using matrices!

It’s real hard to write in boldface on the chalkboard. I’ll often use an underline to indicate that something is a vector or matrix, and I’ll usually use Uppercase for Matrices and lowercase for vectors. Feel free to ask at any time if something is unclear; keeping track of the dimensions of everything is very important.

Here’s how we write the response vector. This is a single column vector containing the $y$ values for every point in the dataset: $\mathbf{y} = \begin{pmatrix} y_1\\ y_2\\ \vdots\\ y_n\\ \end{pmatrix}$

The vector of coefficients: $\boldsymbol{\beta} = \begin{pmatrix} \beta_0\\ \beta_1\\ \vdots\\ \beta_k\\ \end{pmatrix}$

The matrix of predictor values, sometimes called the predictor matrix or (somewhat confusingly) the model matrix: $\mathbf X = \begin{pmatrix} 1 & x_{11} & x_{12} & \ldots & x_{1k}\\ 1 & x_{21} & x_{22} & \ldots & x_{2k}\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ 1 & x_{n1} & x_{n2} & \ldots & x_{nk} \\ \end{pmatrix}$

The error vector: $\boldsymbol{\varepsilon} = \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\\ \end{pmatrix}$

And all put together we have: $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$

where

$\boldsymbol{\varepsilon} \sim N(\mathbf{0},\sigma^2 \mathbf{I}).$

Response moment: Take a moment to go back and forth between the matrix form and a written-out version for a particular data point $i$ . Why is it $\boldsymbol{X}\boldsymbol{\beta}$ and not $\boldsymbol{\beta X}$ ? What does each column and row correspond to?

The next step will be to work out what this condition on $\boldsymbol{\varepsilon}$ really means. That is: how do we think about vectors and matrices of random quantities?