Least Squares Estimates For The Linear Regression

The least squares estimates for a linear regression of the form \[\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}.\] is given by \[\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\]

In order for \(\hat{\beta}\) to exist, we need to make some assumptions.

Assumption: The \((n \times [p+1])\) matrix \(\mathbf{X}\), is of full rank \(p+1\).

This will mean that the matrix \(\mathbf{X}^T\mathbf{X}\) is also of full rank \(p+1\). As a consequence of this:

1. \(\mathbf{X}^T\mathbf{X}\) is a positive definite matrix.

2. \((\mathbf{X}^T\mathbf{X})^{-1}\) exists and is unique.

First, we recall the steps for the calculus based solution for obtaining least squares estimates

  • Construct the sum-of-squares function as:

    \[ S(\boldsymbol \beta) = \sum_{i=1}^n(y_i-E(y_i))^2 \]

  • Differentiate with respect to each of the parameters \[\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha}, \frac{\partial S(\boldsymbol{\beta})}{\partial\beta_1}, \ldots, \frac{\partial S(\boldsymbol{\beta})}{\partial\beta_{(p)}}; \] which can be compactly written as \[\frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \left( \begin{array}{c}\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha}\\ \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_1}\\ \vdots \\ \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_{(p)}} \end{array}\right) \]

  • Set partial derivatives = 0 and solve for each of the parameters i.e. \[\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha} = 0, \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_j} = 0, \quad j=1,\ldots, p; \quad \mbox{ or } \quad \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \mathbf{0}; \] where \(\mathbf{0}\) is length \(p\) vector of \(0\)s.

  • Check that you have found a minimum by computing second derivatives.

Let us first expand \(S(\boldsymbol{\beta})\) \[\begin{aligned} S(\boldsymbol{\beta})&=(\boldsymbol{Y}-\mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})\\ &=(\mathbf{Y}^T-\boldsymbol{\beta}^T\mathbf{X}^T)(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})\\ &=\mathbf{Y}^T\mathbf{Y}-\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\\ &=\mathbf{Y}^T\mathbf{Y}-2\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} & \mbox{ as } \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y} \mbox{is a scalar it is equal to its transpose } (\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta})^T. \end{aligned} \]

Now, we calculate the vector differentiation, \(\frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\).

\[\begin{aligned} \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} &= \frac{\partial} {\partial \boldsymbol{\beta}} ( \mathbf{Y}^T\mathbf{Y}-2\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}) \\ &=0-2\mathbf{Y}^T\mathbf{X}+2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X} \end{aligned} \] Equating the above to \(\mathbf{0}\) gives

\[ \begin{array}{rrll} &2\mathbf{Y}^T\mathbf{X}+2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X} &=\mathbf{0}\\ \implies& \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}&=\mathbf{Y}^T\mathbf{X}\\ \implies& \mathbf{X}^T\mathbf{X} \boldsymbol{\beta} &=\mathbf{X}^T\mathbf{Y}& \mbox{taking transpose on both sides}\\ \implies & \boldsymbol{\beta} &= (\mathbf{X}^T\mathbf{X})^{-1}(\mathbf{X}^T\mathbf{Y})& \mbox{pre-multiplying both sides by } (\mathbf{X}^T\mathbf{X})^{-1}\\ \end{array}\] The last pre-multiplication is valid due to assumption 2 (i.e. \((\mathbf{X}^T\mathbf{X})^{-1}\) exists and is unique). Now we have the solution to \[\quad \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \mathbf{0} \implies \boldsymbol{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}(\mathbf{X}^T\mathbf{Y}). \] We should additionally check the second derivative matrix is positive definite to declare the above solution attaining the minimum of \(S(\boldsymbol{\beta})\)

\[ \frac{\partial^2 S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta} ~~\partial \boldsymbol{\beta^T}}= 2 (\mathbf{X}^T\mathbf{X}) \] which is always positive definite.

Residual Sum Of Squares

This choice of \(\boldsymbol{\beta}\) is the least squares estimator. The resulting minimum value of \(S(\boldsymbol{\beta})\) is called the residual sum of squares, denoted by \(RSS\). Thus,

\[RSS=S(\boldsymbol{\hat{\beta}}) = \mathbf{Y}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\]

i.e.

\[RSS=\mathbf{Y}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}\boldsymbol{\hat{\beta}}.\]