6.2 The Multiple Regression Model
The multiple regression model extends the basic concept of the simple regression model discussed in Chapters 4 and 5. A multiple regression model enables us to estimate the effect on \(Y_i\) of changing a regressor \(X_{1i}\) if the remaining regressors \(X_{2i},X_{3i}\dots,X_{ki}\) do not vary. In fact we already have performed estimation of the multiple regression model (6.2) using R in the previous section. The interpretation of the coefficient on student-teacher ratio is the effect on test scores of a one unit change student-teacher ratio if the percentage of English learners is kept constant.
Just like in the simple regression model, we assume the true relationship between \(Y\) and \(X_{1i},X_{2i}\dots\dots,X_{ki}\) to be linear. On average, this relation is given by the population regression function
\[ E(Y_i\vert X_{1i}=x_1, X_{2i}=x_2, X_{3i}=x_3,\dots, X_{ki}=x_k) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \dots + \beta_k x_k. \tag{6.3} \]
As in the simple regression model, the relation \[Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \dots + \beta_k X_{ki}\] does not hold exactly since there are disturbing influences to the dependent variable \(Y\) we cannot observe as explanatory variables. Therefore we add an error term \(u\) which represents deviations of the observations from the population regression line to (6.3). This yields the population multiple regression model \[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \dots + \beta_k X_{ki} + u_i, \ i=1,\dots,n. \tag{6.4} \]
Key Concept 6.2 summarizes the core concepts of the multiple regression model.
Key Concept 6.2
The Multiple Regression Model
The multiple regression model is
\[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \dots + \beta_k X_{ki} + u_i \ \ , \ \ i=1,\dots,n. \]
The designations are similar to those in the simple regression model:
- \(Y_i\) is the \(i^{th}\) observation in the dependent variable. Observations on the \(k\) regressors are denoted by \(X_{1i},X_{2i},\dots,X_{ki}\) and \(u_i\) is the error term.
- The average relationship between \(Y\) and the regressors is given by the population regression line \[ E(Y_i\vert X_{1i}=x_1, X_{2i}=x_2, X_{3i}=x_3,\dots, X_{ki}=x_k) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \dots + \beta_k x_k. \]
- \(\beta_0\) is the intercept; it is the expected value of \(Y\) when all \(X\)s equal \(0\). \(\beta_j \ , \ j=1,\dots,k\) are the coefficients on \(X_j \ , \ j=1,\dots,k\). \(\beta_1\) measures the expected change in \(Y_i\) that results from a one unit change in \(X_{1i}\) while holding all other regressors constant.
How can we estimate the coefficients of the multiple regression model (6.4)? We will not go too much into detail on this issue as our focus is on using R. However, it should be pointed out that, similarly to the simple regression model, the coefficients of the multiple regression model can be estimated using OLS. As in the simple model, we seek to minimize the sum of squared mistakes by choosing estimates \(b_0,b_1,\dots,b_k\) for the coefficients \(\beta_0,\beta_1,\dots,\beta_k\) such that
\[\sum_{i=1}^n (Y_i - b_0 - b_1 X_{1i} - b_2 X_{2i} - \dots - b_k X_{ki})^2 \tag{6.5}\]
is minimized. Note that (6.5) is simply an extension of \(SSR\) in the case with just one regressor and a constant. The estimators that minimize (6.5) are hence denoted \(\hat\beta_0,\hat\beta_1,\dots,\hat\beta_k\) and, as in the simple regression model, we call them the ordinary least squares estimators of \(\beta_0,\beta_1,\dots,\beta_k\). For the predicted value of \(Y_i\) given the regressors and the estimates \(\hat\beta_0,\hat\beta_1,\dots,\hat\beta_k\) we have
\[ \hat{Y}_i = \hat\beta_0 + \hat\beta_1 X_{1i} + \dots +\hat\beta_k X_{ki}. \] The difference between \(Y_i\) and its predicted value \(\hat{Y}_i\) is called the OLS residual of observation \(i\): \(\hat{u} = Y_i - \hat{Y}_i\).
For further information regarding the theory behind multiple regression, see Chapter 18.1 of the book which inter alia presents a derivation of the OLS estimator in the multiple regression model using matrix notation.
Now let us jump back to the example of test scores and class sizes. The estimated model object is mult.mod
. As for simple regression models we can use summary() to obtain information on estimated coefficients and model statistics.
summary(mult.mod)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 686.0322445 7.41131160 92.565565 3.871327e-280
## STR -1.1012956 0.38027827 -2.896026 3.978059e-03
## english -0.6497768 0.03934254 -16.515882 1.657448e-47
So the estimated multiple regression model is
\[ \widehat{TestScore} = 686.03 - 1.10 \times STR - 0.65 \times PctEL \tag{6.6}. \]
Unlike in the simple regression model where the data can be represented by points in the two-dimensional coordinate system, we now have three dimensions. Hence observations can be represented by points in three-dimensional space. Therefore (6.6) is now longer a regression line but a regression plane. This idea extends to higher dimensions when we further expand the number of regressors \(k\). We then say that the regression model can be represented by a hyperplane in the \(k+1\) dimensional space. It is already hard to imagine such a space if \(k=3\) and we best stick with the general idea that, in the multiple regression model, the dependent variable is explained by a linear combination of the regressors. However, in the present case we are able to visualize the situation. The following figure is an interactive 3D visualization of the data and the estimated regression plane (6.6).
We observe that the estimated regression plane fits the data reasonably well — at least with regard to the shape and spatial position of the points. The color of the markers is an indicator for the absolute deviation from the predicted regression plane. Observations that are colored more reddish lie close to the regression plane while the color shifts to blue with growing distance. An anomaly that can be seen from the plot is that there might be heteroskedasticity: we see that the dispersion of regression errors made, i.e., the distance of observations to the regression plane tends to decrease as the share of English learning students increases.