## 4.1 Linear Regression Model

The population regression model $$E(Y) = X \beta$$ summarizes the trend between the predictors and the mean responses. The individual responses vary about the population regression, $$y_i = X_i \beta + \epsilon_i$$ with assumed mean structure $$y_i \sim N(\mu_i, \sigma^2)$$ and assumed constant variance $$\sigma^2$$. Equivalently, the model presumes a linear relationship between $$y$$ and $$X$$ with residuals $$\epsilon$$ that are independent normal random variables with mean zero and constant variance $$\sigma^2$$. Estimate the population regression model coefficients as $$\hat{y} = X \hat{\beta}$$, and the population variance as $$\hat{\sigma}^2$$. The most common method of estimating the $$\beta$$ coefficients and $$\sigma$$ is ordinary least squares (OLS). OLS minimizes the sum of squared residuals from a random sample. The individual predicted values vary about the actual value, $$e_i = y_i - \hat{y}_i$$, where $$\hat{y}_i = X_i \hat{\beta}$$.

The OLS model is the best linear unbiased estimator (BLUE) if the residuals are independent random variables normally distributed with mean zero and constant variance $$\sigma^2$$. Recall these conditions with the LINE pneumonic: Linear, Independent, Normal, and Equal.

Linearity. The explanatory variables are each linearly related to the response variable: $$E(\epsilon | X_j) = 0$$.

Independence. The residuals are unrelated to each other. Independence is violated when repeated measurements are taken, or when there is a temporal component in the model.

Normality. The residuals are normally distributed: $$\epsilon|X \sim N(0, \sigma^2I)$$.

Equal Variances. The variance of the residuals is constant (homoscedasticity): $$E(\epsilon \epsilon' | X) = \sigma^2I$$

Additionally, you should make sure you model has “little” or no multicollinearity among the variables.