Least Squares Estimates For The Linear Regression

The least squares estimates for a linear regression of the form \mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}. is given by \boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}

In order for \hat{\beta} to exist, we need to make some assumptions.

Assumption: The (n \times [p+1]) matrix \mathbf{X}, is of full rank p+1.

This will mean that the matrix \mathbf{X}^T\mathbf{X} is also of full rank p+1. As a consequence of this:

1. \mathbf{X}^T\mathbf{X} is a positive definite matrix.

2. (\mathbf{X}^T\mathbf{X})^{-1} exists and is unique.

First, we recall the steps for the calculus based solution for obtaining least squares estimates

  • Construct the sum-of-squares function as:

    S(\boldsymbol \beta) = \sum_{i=1}^n(y_i-E(y_i))^2

  • Differentiate with respect to each of the parameters \frac{\partial S(\boldsymbol{\beta})}{\partial \alpha}, \frac{\partial S(\boldsymbol{\beta})}{\partial\beta_1}, \ldots, \frac{\partial S(\boldsymbol{\beta})}{\partial\beta_{(p)}}; which can be compactly written as \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \left( \begin{array}{c}\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha}\\ \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_1}\\ \vdots \\ \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_{(p)}} \end{array}\right)

  • Set partial derivatives = 0 and solve for each of the parameters i.e. \frac{\partial S(\boldsymbol{\beta})}{\partial \alpha} = 0, \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_j} = 0, \quad j=1,\ldots, p; \quad \mbox{ or } \quad \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \mathbf{0}; where \mathbf{0} is length p vector of 0s.

  • Check that you have found a minimum by computing second derivatives.

Let us first expand S(\boldsymbol{\beta}) \begin{aligned} S(\boldsymbol{\beta})&=(\boldsymbol{Y}-\mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})\\ &=(\mathbf{Y}^T-\boldsymbol{\beta}^T\mathbf{X}^T)(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})\\ &=\mathbf{Y}^T\mathbf{Y}-\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\\ &=\mathbf{Y}^T\mathbf{Y}-2\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} & \mbox{ as } \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y} \mbox{is a scalar it is equal to its transpose } (\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta})^T. \end{aligned}

Now, we calculate the vector differentiation, \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}.

\begin{aligned} \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} &= \frac{\partial} {\partial \boldsymbol{\beta}} ( \mathbf{Y}^T\mathbf{Y}-2\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}) \\ &=0-2\mathbf{Y}^T\mathbf{X}+2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X} \end{aligned} Equating the above to \mathbf{0} gives

\begin{array}{rrll} &2\mathbf{Y}^T\mathbf{X}+2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X} &=\mathbf{0}\\ \implies& \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}&=\mathbf{Y}^T\mathbf{X}\\ \implies& \mathbf{X}^T\mathbf{X} \boldsymbol{\beta} &=\mathbf{X}^T\mathbf{Y}& \mbox{taking transpose on both sides}\\ \implies & \boldsymbol{\beta} &= (\mathbf{X}^T\mathbf{X})^{-1}(\mathbf{X}^T\mathbf{Y})& \mbox{pre-multiplying both sides by } (\mathbf{X}^T\mathbf{X})^{-1}\\ \end{array} The last pre-multiplication is valid due to assumption 2 (i.e. (\mathbf{X}^T\mathbf{X})^{-1} exists and is unique). Now we have the solution to \quad \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \mathbf{0} \implies \boldsymbol{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}(\mathbf{X}^T\mathbf{Y}). We should additionally check the second derivative matrix is positive definite to declare the above solution attaining the minimum of S(\boldsymbol{\beta})

\frac{\partial^2 S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta} ~~\partial \boldsymbol{\beta^T}}= 2 (\mathbf{X}^T\mathbf{X}) which is always positive definite.

Residual Sum Of Squares

This choice of \boldsymbol{\beta} is the least squares estimator. The resulting minimum value of S(\boldsymbol{\beta}) is called the residual sum of squares, denoted by RSS. Thus,

RSS=S(\boldsymbol{\hat{\beta}}) = \mathbf{Y}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}

i.e.

RSS=\mathbf{Y}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}\boldsymbol{\hat{\beta}}.