Residuals

Defining Residuals

A sensible strategy is to focus attention on the differences between the observed data y1,,yn and the fitted values from the fitted regression model ˆy1,,ˆyn. This is easiest to see in the case of simple linear regression, with one explanatory variable, yi=α+βxi+ϵi. Here we define fitted values as ˆyi=ˆα+ˆβxi and residuals as ˆϵi=yiˆyi In the case of k explanatory variables, the fitted values are ˆyi=ˆα+ˆβ1x1i++ˆβkxki. and the residuals are ˆϵi=yi^yi

The residuals are not the true errors, they are only estimates based on the observed data. For instance, if we observed different data from the same population then our estimated residuals would be different because of random variation. The estimated residuals ˆϵi’s should have properties similar to the true error terms ϵ. The residuals, unlike the true errors, do not all have the same variance. In order to adjust for the fact that different ˆϵi’s can have different variances, we can use standardized residuals, these are defined as

ri=ˆϵiVar(ˆϵi) where Var(ˆϵi) is the variance of the estimated residuals commonly denoted by σ2.

Estimating the error variance

The error variance σ2 is estimated as:

ˆσ2=RSSnp.

where p = number of regressions coefficients estimated in the model. Please note that if we included an intercept term in the regression model, then p is the number of explanatory variables plus 1. For example, in the simple linear regression model in the above example p=2.