Section 7 Classical Linear Regression

Analytical Properties

A regression function is a conditional expectation. The expectation is calculated for one variable Y, the response, conditional on the values for the other “explanatory” variables Z1,Z2,...,Zr. It need not be linear but always predicts Y with smallest MSE (ref Rao et al. (1973)). The regression function is written E[Y|Z1,Z2,...,Zr].

We can motivate the form of the classical linear regression model with reference to the Multivariate Normal Distribution (ref Johnson, Wichern, and others (2014), page 404, eqn. 7.51):

Example 7.1 (Regression Function for Multivariate Random variables) Let Y and Z1,Z2,...,Zr be generated by a Multivariate Normal Distribution, ie.:

[YZ1...Zr]Nr+1(μ,Σ)

We can then infer the conditional distribution of Y given Z1,Z2,...,Zr. Using the notation of definition 5.3:

Y|Z1,Z2,...,ZrN1(μY+σ´XYσ1ZZ(ZμZ),σYYσ´ZYσZZσZY)

We can then identify the regression function as a linear function of the vector of explanatory variables Z plus a constant term:

E[Y|Z1,Z2,...,Zr]=μY+σ´XYσ1ZZ(ZμZ)

By making the following definitions:

bZ:=Σ1ZZσZYb0:=μYb´ZμZ

We can write the regression function in the following form:

E[Y|Z1,Z2,...,Zr]=b0+b´ZZ

In the classical linear regression model, we call the linear predictor a mean effect.The response Y is generated from the mean effect by adding an error term (see Johnson, Wichern, and others (2014), page 362, eqn. 7.3).

Definition 7.1 (Classical Linear Regression Model) Let Y be a univariate response variable whose relationship with r predictor variables Z1,Z2,...,Zr is under investigation. The classical Linear Regression Model is:

Y=β0+β1Z1+β2Z2,...+βrZr+ϵ[Response]=[Mean]+[Error]

The following additional assumptions are made. For a Random Sample of size n, we assume:

The error term has zero expectation: E[ϵ(n×1)]=0(n×1)

The individual components of the error vector have equal variance and are mutually uncorrelated:

Cov(ϵ)(n×n)=σ2I(n×n)

The error vector is generated by a multivariate normal distribution:

ϵ(n×1)Nn(0,σ2I)

When fitting classical linear models, it is important to remember that the explanatory variables Z are considered fixed but the parameter values [β0,β1,..,βr] need to be determined .

The geometrical picture of a Classical Linear Regression model is as follows (see Johnson, Wichern, and others (2014), page 367):

Definition 7.2 (Geometrical Picture of Linear Regression) The expected value of the response vector is:

E[Y]=Xβ=[1Z11Z12...Z1r1Z21Z22Z2r1Zn1Zn2Znr][β0β1βr]

This can be re-written as a sum of the columns of Z:

E[Y]=β01(n×1)+β1Z1(n×1)+...βrZr(n×1)

Thus the linear regression model states that the mean vector lies in a hyperplane spanned by the r+1 measurement vectors. These vectors are fixed by the measurement data and model fitting corresponds to finding the parameter values [β0,β1,..,βr] which minimise the Mean Square Error of the sample data.

The hyperplane is known as the model plane.

Model Fitting

Estimating the best linear predictor (aka. the mean effect) for a Classical Linear Regression model can be achieved with ordinary least squares (OLS). Geometrically the OLS estimation procedure finds sample estimates for the parameter values [β0,β1,..,βr] which place the response vector y and a vector in the model plane as close together as possible. This corresponds to decomposing the response vector y in terms of a projection onto the model plane (aka. the prediction vector) and a vector of residuals orthogonal to the model plane (aka. the residual vector)

A representation of OLS model fitting with three observations (\(n=3\)) of one explanatory variable (\(col_{2}(Z)\)). Reproduced from Johnson, Wichern, and others (2014), Figure 7.1.

Figure 7.1: A representation of OLS model fitting with three observations (n=3) of one explanatory variable (col2(Z)). Reproduced from Johnson, Wichern, and others (2014), Figure 7.1.

The following analytical result for OLS estimates can be derived (see Johnson, Wichern, and others (2014), page 364, eqn. 7.1):

Proposition 7.1 (OLS Estimates of Linear Regression Model) The r+1 dimensional vector of sample estimates for the parameter values β=[β0,β1,..,βr] are denoted ˆβ=[b0,b1,..,br]. The OLS estiamtes of ˆβ:

ˆβ=(Z´Z)1Z´y

The n dimensional projection of the response vector onto the model plane is known as the prediction vector and denoted ˆy. The OLS estimate is:

ˆy=Z(Z´Z)1Z´y

The n dimensional vector of differences between the responses y and the predictions ˆy is known as the residual vector and denoted ˆϵ. The OLS estimate is:

ˆϵ=(1Z(Z´Z)1Z´)y.

The ability of the best linear predictor to explain variations in the response vector is estimated using the Coefficient of Determination (=ρ2Y(Z) see def. 5.3 ). (For proof see Johnson, Wichern, and others (2014), page 367, eqn. 7.9)

Proposition 7.2 (Measuring the Performance of Best Estimator) The orthogonality of ˆϵ and ˆy under OLS fitting allows the decomposition:

ni=1(yi¯y)2=ni=1(^yi¯y)2+ni=1(yi^yi)2[TotalSumofSquares]=[RegressionSumofSquares]+[ResidualSumofSquares]

The coefficient of determination R2 is calculated as follows:

R2=1ni=1(yi^yi)2ni=1(yi¯y)2=ni=1(^yi¯y)2ni=1(yi¯y)2=[RegressionSumofSquares]/[TotalSumofSquares]

If the model assumptions in definition 7.1 are valid then the Sample Estimates of the model parameters have the following distributional properties (see Johnson, Wichern, and others (2014), page 404, page 370, section 7.4):

Proposition 7.3 (Sampling Properties of OLS Estimates) The parameter estimates ˆβ have the following properties:

E[ˆβ]=βCov[ˆβ]=σ2(Z´Z)1ˆβNr+1(β,σ2(Z´Z)1)

The sample estimates of the error term:

s2:=ˆϵ´ˆϵn(r+1)E[s2]=σ2ˆϵ´ˆϵσ2χ2nr1

References

Rao, Calyampudi Radhakrishna, Calyampudi Radhakrishna Rao, Mathematischer Statistiker, Calyampudi Radhakrishna Rao, and Calyampudi Radhakrishna Rao. 1973. Linear Statistical Inference and Its Applications. Vol. 2. Wiley New York.

Johnson, Richard Arnold, Dean W Wichern, and others. 2014. Applied Multivariate Statistical Analysis. Vol. 4. Prentice-Hall New Jersey.