Least Squares Estimates For The Linear Regression

The least squares estimates for a linear regression of the form $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}.$ is given by $\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$

In order for $\hat{\beta}$ to exist, we need to make some assumptions.

Assumption: The $(n \times [p+1])$ matrix $\mathbf{X}$ , is of full rank $p+1$ .

This will mean that the matrix $\mathbf{X}^T\mathbf{X}$ is also of full rank $p+1$ . As a consequence of this:

1. $\mathbf{X}^T\mathbf{X}$ is a positive definite matrix.

2. $(\mathbf{X}^T\mathbf{X})^{-1}$ exists and is unique.

First, we recall the steps for the calculus based solution for obtaining least squares estimates

Construct the sum-of-squares function as:

$S(\boldsymbol \beta) = \sum_{i=1}^n(y_i-E(y_i))^2$
Differentiate with respect to each of the parameters $\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha}, \frac{\partial S(\boldsymbol{\beta})}{\partial\beta_1}, \ldots, \frac{\partial S(\boldsymbol{\beta})}{\partial\beta_{(p)}};$ which can be compactly written as $\frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \left( \begin{array}{c}\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha}\\ \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_1}\\ \vdots \\ \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_{(p)}} \end{array}\right)$
Set partial derivatives = 0 and solve for each of the parameters i.e. $\frac{\partial S(\boldsymbol{\beta})}{\partial \alpha} = 0, \frac{\partial S(\boldsymbol{\beta})}{\partial \beta_j} = 0, \quad j=1,\ldots, p; \quad \mbox{ or } \quad \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \mathbf{0};$ where $\mathbf{0}$ is length $p$ vector of $0$ s.
Check that you have found a minimum by computing second derivatives.

Let us first expand $S(\boldsymbol{\beta})$ $\begin{aligned} S(\boldsymbol{\beta})&=(\boldsymbol{Y}-\mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})\\ &=(\mathbf{Y}^T-\boldsymbol{\beta}^T\mathbf{X}^T)(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})\\ &=\mathbf{Y}^T\mathbf{Y}-\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\\ &=\mathbf{Y}^T\mathbf{Y}-2\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} & \mbox{ as } \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y} \mbox{is a scalar it is equal to its transpose } (\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta})^T. \end{aligned}$

Now, we calculate the vector differentiation, $\frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}$ .

$\begin{aligned} \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} &= \frac{\partial} {\partial \boldsymbol{\beta}} ( \mathbf{Y}^T\mathbf{Y}-2\mathbf{Y}^T\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}) \\ &=0-2\mathbf{Y}^T\mathbf{X}+2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X} \end{aligned}$ Equating the above to $\mathbf{0}$ gives

$\begin{array}{rrll} &2\mathbf{Y}^T\mathbf{X}+2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X} &=\mathbf{0}\\ \implies& \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}&=\mathbf{Y}^T\mathbf{X}\\ \implies& \mathbf{X}^T\mathbf{X} \boldsymbol{\beta} &=\mathbf{X}^T\mathbf{Y}& \mbox{taking transpose on both sides}\\ \implies & \boldsymbol{\beta} &= (\mathbf{X}^T\mathbf{X})^{-1}(\mathbf{X}^T\mathbf{Y})& \mbox{pre-multiplying both sides by } (\mathbf{X}^T\mathbf{X})^{-1}\\ \end{array}$ The last pre-multiplication is valid due to assumption 2 (i.e. $(\mathbf{X}^T\mathbf{X})^{-1}$ exists and is unique). Now we have the solution to $\quad \frac{\partial S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}= \mathbf{0} \implies \boldsymbol{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}(\mathbf{X}^T\mathbf{Y}).$ We should additionally check the second derivative matrix is positive definite to declare the above solution attaining the minimum of $S(\boldsymbol{\beta})$

$\frac{\partial^2 S(\boldsymbol{\beta})}{\partial \boldsymbol{\beta} ~~\partial \boldsymbol{\beta^T}}= 2 (\mathbf{X}^T\mathbf{X})$ which is always positive definite.

Residual Sum Of Squares

This choice of $\boldsymbol{\beta}$ is the least squares estimator. The resulting minimum value of $S(\boldsymbol{\beta})$ is called the residual sum of squares, denoted by $RSS$ . Thus,

$RSS=S(\boldsymbol{\hat{\beta}}) = \mathbf{Y}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$

i.e.

$RSS=\mathbf{Y}^T\mathbf{Y}-\mathbf{Y}^T\mathbf{X}\boldsymbol{\hat{\beta}}.$