1.16 Matrix form of the regression equation
Let’s start by considering something familiar: the simple linear regression equation for a single data point (we’ll call it point \(i\): the \(i\)th data point in our dataset).
\[y_i = \beta_0 + \beta_1 x_{i} + \varepsilon_i\] (Check yourself: is this the true regression equation or the estimated/fitted one?)
Using just one predictor can get pretty boring, so we can extend this to the multiple linear regression equation! Again, this is for a single data point \(i\); now we have \(k\) different predictors (\(x\) variables) instead of just one.
\[y_i = \beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik} + \varepsilon_i\]
So, what if we want to talk about all the other data points? We could write down this equation over and over again, with the new values for \(y_i\), \(\varepsilon_i\), and all the \(x_{ij}\)’s each time. But we can save a lot of time and space (and general grief) using matrices!
It’s real hard to write in boldface on the chalkboard. I’ll often use an underline to indicate that something is a vector or matrix, and I’ll usually use Uppercase for Matrices and lowercase for vectors. Feel free to ask at any time if something is unclear; keeping track of the dimensions of everything is very important.
Here’s how we write the response vector. This is a single column vector containing the \(y\) values for every point in the dataset: \[\mathbf{y} = \begin{pmatrix} y_1\\ y_2\\ \vdots\\ y_n\\ \end{pmatrix}\]
The vector of coefficients: \[\boldsymbol{\beta} = \begin{pmatrix} \beta_0\\ \beta_1\\ \vdots\\ \beta_k\\ \end{pmatrix}\]
The matrix of predictor values, sometimes called the predictor matrix or (somewhat confusingly) the model matrix: \[\mathbf X = \begin{pmatrix} 1 & x_{11} & x_{12} & \ldots & x_{1k}\\ 1 & x_{21} & x_{22} & \ldots & x_{2k}\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ 1 & x_{n1} & x_{n2} & \ldots & x_{nk} \\ \end{pmatrix}\]
The error vector: \[\boldsymbol{\varepsilon} = \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\\ \end{pmatrix}\]
And all put together we have: \[\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\]
where
\[\boldsymbol{\varepsilon} \sim N(\mathbf{0},\sigma^2 \mathbf{I}).\]
Response moment: Take a moment to go back and forth between the matrix form and a written-out version for a particular data point \(i\). Why is it \(\boldsymbol{X}\boldsymbol{\beta}\) and not \(\boldsymbol{\beta X}\)? What does each column and row correspond to?
The next step will be to work out what this condition on \(\boldsymbol{\varepsilon}\) really means. That is: how do we think about vectors and matrices of random quantities?