2.4 Matrix form of the regression equation

Let’s start by considering something familiar: the simple linear regression equation for a single data point (we’ll call it point i: the ith data point in our dataset).

yi=β0+β1xi+εi (Check yourself: is this the true regression equation or the estimated/fitted one?)

Using just one predictor can get pretty boring, so we can extend this to the multiple linear regression equation! Again, this is for a single data point i; now we have k different predictors (x variables) instead of just one.

yi=β0+β1xi1++βkxik+εi

So, what if we want to talk about all the other data points? We could write down this equation over and over again, with the new values for yi, εi, and all the xij’s each time. But we can save a lot of time and space (and general grief) using matrices!

It’s real hard to write in boldface on the chalkboard. I’ll often use an underline to indicate that something is a vector or matrix, and I’ll usually use Uppercase for Matrices and lowercase for vectors. Feel free to ask at any time if something is unclear; keeping track of the dimensions of everything is very important.

Here’s how we write the response vector. This is a single column vector containing the y values for every point in the dataset: y=(y1y2yn)

The vector of coefficients: \boldsymbol{\beta} = \begin{pmatrix} \beta_0\\ \beta_1\\ \vdots\\ \beta_k\\ \end{pmatrix}

The matrix of predictor values, sometimes called the predictor matrix or (somewhat confusingly) the model matrix: \mathbf X = \begin{pmatrix} 1 & x_{11} & x_{12} & \ldots & x_{1k}\\ 1 & x_{21} & x_{22} & \ldots & x_{2k}\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ 1 & x_{n1} & x_{n2} & \ldots & x_{nk} \\ \end{pmatrix}

The error vector: \boldsymbol{\varepsilon} = \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\\ \end{pmatrix}

And all put together we have: \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

where

\boldsymbol{\varepsilon} \sim N(\mathbf{0},\sigma^2 \mathbf{I}).

Response moment: Take a moment to go back and forth between the matrix form and a written-out version for a particular data point i. Why is it \boldsymbol{X}\boldsymbol{\beta} and not \boldsymbol{\beta X}? What does each column and row correspond to?

The next step will be to work out what this condition on \boldsymbol{\varepsilon} really means. That is: how do we think about vectors and matrices of random quantities?