Vector-matrix form of a linear model

Any linear model with \(p\) explanatory variables can be written as a series of equations, with the following notation: \[ y_{i}=\alpha+\beta_{1}x_{i, 1}+\beta_{2}x_{i, 2}+\ldots+\beta_{p}x_{i,\, p}+\epsilon_{i}, \quad \quad i=1,\ldots n.\] Or in the expectation format as

\[ E(y_{i})=\alpha+\beta_{1}x_{i, 1}+\beta_{2}x_{i, 2}+\ldots+\beta_{p}x_{i, \, p},\quad \quad i=1,\ldots n.\]

The vector-matrix representation of a linear model with \(p\) explanatory variables can be written as

\[\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon},\]

or in the expectation format as

\[E(\mathbf{Y}) = \mathbf{X}\boldsymbol{\beta},\]

where:

  • \(\mathbf{Y}\) is the \((n \times 1)\) vector of observations.

  • \(\boldsymbol{\beta}\) is the \(([p+1] \times 1)\) vector of parameters.

  • \(\mathbf{X}\) is the \((n \times [p+1])\) design matrix.

  • \(\boldsymbol{\epsilon}\) is the \((n \times 1)\) vector of random errors which will be discussed later.

In full,

\[\begin{aligned} \mbox{response } \mathbf{Y} =\left( \begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \\ \end{array} \right), \quad \quad \mbox{parameters } \boldsymbol{\beta} =\left( \begin{array}{c} \alpha \\ \beta_1\\ \vdots\\ \beta_{p} \end{array} \right), \end{aligned}\] \[ \begin{aligned} \mbox{design matrix is } \mathbf{X}=\left( \begin{array}{cccc} 1 & x_{1, 1} & \ldots &x_{1, \, p}\\ 1 & x_{2, 1} & \ldots &x_{2, \, p}\\ \vdots &\vdots & \ddots & \vdots \\ 1 & x_{n, 1} & \ldots &x_{n, \, p} \end{array} \right), \quad \mbox{ and } \boldsymbol{\epsilon} &= \left( \begin{array}{c} \epsilon_{1} \\ \epsilon_{2} \\ \vdots\\ \epsilon_{n} \\ \end{array} \right).\\ \end{aligned}\]

\(\mathbf{X}\) is called the design matrix and its elements are known constants. Note that for matrix multiplication and addition we should always make sure the dimensions match, which gives us an additional check.