The least-squares estimate of linear functions of the parameters in a multiple linear model

Data: \((y_i, x_{1i}, x_{2i},\dots,x_{ki}); \quad i=1,\dots,n\) \(\newline\) Model: \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}\) with \(\boldsymbol{\epsilon}=\{\epsilon_1, \ldots, \epsilon_n\}\) \(\sim N(0, \sigma^2)\) independent.

\(\newline\) Suppose we want the least-squares estimate for a linear function of the parameters,

Say \(\mathbf{b}_1^T\boldsymbol{\beta}\) for some given vector \(\mathbf{b}_1\). For instance in the previous example \(\mathbf{b}_1^T= (1 \hspace{0.35cm} 5)\), or
Possibly for a set of \(s\) linearly independent linear combinations \(\mathbf{b}_1^T\boldsymbol{\beta}, \dots,\mathbf{b}_s^T\boldsymbol{\beta}\), \(s\leq p\) where the \(\mathbf{b}_i\)’s are given vectors (similar to the previous example). Here, \(p\) is the number of regression coefficients, hence \(\boldsymbol{\beta}\) is a vecotr of length \(p\).

It is always possible to create a non-singular transformation from \(\boldsymbol{\beta} \leftrightarrow \boldsymbol{\phi}\) where

\[\boldsymbol{\phi} = \left( \begin{array}{c} \mathbf{b}_1^T \\ \mathbf{b}_2^T \\ . \\ . \\ . \\ \mathbf{b}_s^T \\ \end{array} \right)\boldsymbol{\beta} = \mathbf{B}\boldsymbol{\beta},\]

where \(\mathbf{B}\) is \(s \times p\) nonsingular and so invertible matrix. So

\[\boldsymbol{\beta} = \mathbf{B}^{-1}\boldsymbol{\phi}\]

where \(B^{-1}\) is of dimension \(p \times s\). It is now possible to rewrite our model in terms of \(\boldsymbol{\phi}\), where \({\phi}_1, \dots, {\phi}_s\) are the parameters of interest.

\(\newline\) Data: \((y_i, x_{1i}, x_{2i},\dots,x_{pi}); \quad i=1,\dots,n\)

\(\newline\) Model: \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon} = (\mathbf{XB}^{-1})\boldsymbol{\phi}+\boldsymbol{\epsilon}\)

\(\newline\) \((\mathbf{XB}^{-1})\) is an \(n \times s\) matrix which is known (and is just a transformed design matrix) and \(\boldsymbol{\phi}\) is a \(s\) vector of unknown parameters.

The form of the model is mathematically equivalent to our original form, substituting \((\mathbf{XB}^{-1})\) for the design matrix and \(\boldsymbol{\phi}\) for the parameter vector.

\(\newline\) Hence, we can write down the solution for the parameter estimates, based on least-squares, from our earlier results.

\[\begin{aligned} \boldsymbol{\hat{\phi}} &= \{(\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1})\}^{-1}(\mathbf{XB}^{-1})^T\mathbf{Y}\\ &=\{(\mathbf{B}^{-1})^T(\mathbf{X}^T\mathbf{X})\mathbf{B}^{-1}\}^{-1}(\mathbf{B}^{-1})^T\mathbf{X}^T\mathbf{Y}\\ &=\mathbf{B}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{B}^T(\mathbf{B}^T)^{-1}\mathbf{X}^T\mathbf{Y}\\ &=\mathbf{B}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\\ &=\mathbf{B}\boldsymbol{\hat{\beta}}.\end{aligned}\]

Hence the least-squares estimates of a set of linear functions of parameters is just the set of linear functions of the least-squares estimates.

Application of linear transformation of parameters for two parameter case

A useful application of this result may sometimes simplify the calculation of least-squares estimates. The basic idea is that it may be possible to rewrite a model in terms of parameters whose estimates are “easier” to calculate and then we can transform back to the original parameters. This approach is often referred to as ‘centering’. Centering in often useful to produce orthogonal columns which in turn gives us diagonal matrices to invert.

Example

\(\newline\) Data: \((y_i, x_i); \quad i=1,\dots,n\)

\(\newline\) Model: \(y_i=\alpha+\beta x_i+\epsilon_i\), or in vector matrix notation \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon} = (\mathbf{XB}^{-1})\boldsymbol{\phi}+\boldsymbol{\epsilon}\)

\(\newline\) Suppose we transform

\(y_i=\alpha+\beta x_i+\epsilon_i\) (Model 1) to

\(y_i=\alpha'+\beta'(x_i-\bar{x})+\epsilon_i\) (Model 2)

Then what we are saying here is that if we know the parameter estimates \(\hat{\alpha}\) and \(\hat{\beta}\) then we also should know \(\hat{\alpha}'\) and \(\hat{\beta}'\) (and vice versa).

\(\newline\) Firstly, consider Model 2

\[\begin{eqnarray*} y_i &=& \alpha'+\beta'(x_i-\bar{x})+\epsilon_i \\ &=& \alpha'+\beta' x_i- \beta' \bar{x}+\epsilon_i \\ &=& \alpha'- \beta' \bar{x}+\beta' x_i+\epsilon_i \\ \end{eqnarray*}\]

which implies \(\alpha=\alpha'- \beta' \bar{x}\) and \(\beta=\beta'\)

\[\boldsymbol{\beta} = \left( \begin{array}{c} \alpha \\ \beta \\ \end{array} \right)\leftrightarrow\boldsymbol{\phi} = \left( \begin{array}{c} \alpha' \\ \beta' \\ \end{array} \right) = \left( \begin{array}{c} \alpha+\beta\bar{x} \\ \beta \\ \end{array} \right), \quad \bar{x} = (\sum_{i=1}^n x_i)/n.\]

Writing Model 2 in vector matrix notation,

\[E(\mathbf{Y}) =\left( \begin{array}{cc} 1 & (x_1-\bar{x}) \\ . & . \\ . & . \\ . & . \\ . & . \\ 1 & (x_n-\bar{x}) \\ \end{array} \right)\left( \begin{array}{c} \alpha' \\ \beta \\ \end{array} \right) = \mathbf{XB}^{-1}\boldsymbol{\phi}\]

where \(\boldsymbol{\phi} = \mathbf{B}\boldsymbol{\beta}\) and \(\mathbf{B} = \left( \begin{array}{cc} 1 & \bar{x} \\ 0 & 1 \\ \end{array} \right)\)

\(\newline\) \[\begin{aligned} \boldsymbol{\hat{\phi}} &= \{(\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1})\}^{-1}(\mathbf{XB}^{-1})^T\mathbf{Y}\\ (\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1}) &= \left( \begin{array}{cc} n & \sum_{i=1}^n(x_i-\bar{x}) \\ \sum_{i=1}^n(x_i-\bar{x}) & \sum_{i=1}^n(x_i-\bar{x})^2 \\ \end{array} \right)\\ &=\left( \begin{array}{cc} n & 0 \\ 0 & \sum_{i=1}^n(x_i-\bar{x})^2 \\ \end{array} \right)\end{aligned}\]

i.e. \((\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1})\) is diagonal.

\[\begin{aligned} (\mathbf{XB}^{-1})^T\mathbf{Y} &= \left( \begin{array}{c} \sum_{i=1}^n y_i\\ \sum_{i=1}^n y_i(x_i-\bar{x}) \\ \end{array} \right) = \left( \begin{array}{c} \sum_{i=1}^n y_i\\ \sum_{i=1}^n (y_i-\bar{y})(x_i-\bar{x}) \\ \end{array} \right)\\ \{(\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1})\}^{-1}&=\left( \begin{array}{cc} \frac{1}{n} & 0 \\ 0 & \frac{1}{\sum_{i=1}^n(x_i-\bar{x})^2} \\ \end{array} \right)\end{aligned}\]

i.e.

\[\boldsymbol{\hat{\phi}} = \left( \begin{array}{c} \sum_{i=1}^n y_i/n \\ \frac{ \sum_{i=1}^n (y_i-\bar{y})(x_i-\bar{x})}{\sum_{i=1}^n(x_i-\bar{x})^2} \\ \end{array} \right) = \left( \begin{array}{c} \hat{\alpha}' \\ \hat{\beta} \\ \end{array} \right)\]

Because of our choice of \(\boldsymbol{\phi}\), \((\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1})\) is easier to invert and hence the calculations are simpler.

From the nature of the transformation clearly

\[\hat{\alpha} = \hat{\alpha}'-\hat{\beta}\bar{x}\]

Application of linear transformation of parameters for three parameter case

\(\newline\) Data: \((y_i, x_{1i}, x_{2i}), \quad i=1,\dots,n\)

\(\newline\) Model: \(E(Y_i) = \alpha+\beta x_{1i}+\gamma x_{2i}\)

\(\newline\) Reparameterise to

\(\newline\) \(\mathrm{Model: }\, E(Y_i) = \alpha'+\beta(x_{1i}-\bar{x}_{1.})+\gamma(x_{2i}-\bar{x}_{2.})\)

\(\newline\) \[\boldsymbol{\beta} = \left( \begin{array}{c} \alpha \\ \beta \\ \gamma \\ \end{array} \right)\leftrightarrow\boldsymbol{\phi} = \left( \begin{array}{c} \alpha' \\ \beta \\ \gamma \\ \end{array} \right) = \left( \begin{array}{c} \alpha+\beta\bar{x}_{1.}+\gamma \bar{x}_{2.} \\ \beta \\ \gamma \\ \end{array} \right),\]

where \(\bar{x}_{1.} = \sum_{i=1}^nx_{1i}/n, \quad \bar{x}_{2.} = \sum_{i=1}^nx_{2i}/n\). i.e.

\[E(\mathbf{Y}) = \left( \begin{array}{ccc} 1 & (x_{11}-\bar{x}_{1.}) & (x_{21}-\bar{x}_{2.}) \\ . & . & . \\ . & . & . \\ . & . & . \\ . & . & . \\ 1 & (x_{1n}-\bar{x}_{1.}) & (x_{2n}-\bar{x}_{2.}) \\ \end{array} \right)\left( \begin{array}{c} \alpha' \\ \beta \\ \gamma \\ \end{array} \right) = \mathbf{XB}^{-1}\boldsymbol{\phi}\]

\[\boldsymbol{\hat{\phi}} = ((\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1}))^{-1}(\mathbf{XB}^{-1})^T\mathbf{Y}\]

\[\begin{aligned} ((\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1})) &= \left( \begin{array}{ccc} n & 0 & 0 \\ 0 & \sum_{i=1}^n(x_{1i}-\bar{x}_{1.})^2 & \sum_{i=1}^n(x_{1i}-\bar{x}_{1.})(x_{2i}-\bar{x}_{2.}) \\ 0 & \sum_{i=1}^n(x_{1i}-\bar{x}_{1.})(x_{2i}-\bar{x}_{2.}) & \sum_{i=1}^n(x_{2i}-\bar{x}_{2.})^2 \\ \end{array} \right)\\[2ex] &=\left( \begin{array}{cc} n & \mathbf{0} \\ \mathbf{0}^T & \boldsymbol{\Psi}\\ \end{array} \right)\\[2ex] ((\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1}))^{-1} &= \left( \begin{array}{cc} \frac{1}{n} & \mathbf{0} \\ \mathbf{0}^T & \boldsymbol{\Psi}^{-1}\\ \end{array} \right)\\ \end{aligned}\]

Hence inversion of \(((\mathbf{XB}^{-1})^T(\mathbf{XB}^{-1}))\) is reduced to inversion of a \((2 \times 2)\) matrix, a great saving in calculation. In general a similar transformation will reduce the inversion of a \((p \times p)\) matrix to the inversion of a \(((p - 1) \times (p - 1)\) matrix.

After calculation of \(\boldsymbol{\hat{\phi}}, \hat{\alpha}\) can be obtained from

\[\hat{\alpha} = \hat{\alpha'}-\hat{\beta}\bar{x}_{1.}-\hat{\gamma}\bar{x}_{2.}\]