4.1 Linear Regression Model

The population regression model \(E(Y) = X \beta\) summarizes the trend between the predictors and the mean responses. The individual responses vary about the population regression, \(y_i = X_i \beta + \epsilon_i\) with assumed mean structure \(y_i \sim N(\mu_i, \sigma^2)\) and assumed constant variance \(\sigma^2\). Equivalently, the model presumes a linear relationship between \(y\) and \(X\) with residuals \(\epsilon\) that are independent normal random variables with mean zero and constant variance \(\sigma^2\). Estimate the population regression model coefficients as \(\hat{y} = X \hat{\beta}\), and the population variance as \(\hat{\sigma}^2\). The most common method of estimating the \(\beta\) coefficients and \(\sigma\) is ordinary least squares (OLS). OLS minimizes the sum of squared residuals from a random sample. The individual predicted values vary about the actual value, \(e_i = y_i - \hat{y}_i\), where \(\hat{y}_i = X_i \hat{\beta}\).

The OLS model is the best linear unbiased estimator (BLUE) if the residuals are independent random variables normally distributed with mean zero and constant variance \(\sigma^2\). Recall these conditions with the LINE pneumonic: Linear, Independent, Normal, and Equal.

Linearity. The explanatory variables are each linearly related to the response variable: \(E(\epsilon | X_j) = 0\).

Independence. The residuals are unrelated to each other. Independence is violated when repeated measurements are taken, or when there is a temporal component in the model.

Normality. The residuals are normally distributed: \(\epsilon|X \sim N(0, \sigma^2I)\).

Equal Variances. The variance of the residuals is constant (homoscedasticity): \(E(\epsilon \epsilon' | X) = \sigma^2I\)

Additionally, you should make sure you model has “little” or no multicollinearity among the variables.