This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

6.2 The Multiple Regression Model

The multiple regression model extends the basic concept of the simple regression model discussed in Chapters 4 and 5. A multiple regression model enables us to estimate the effect on Yi of changing a regressor X1i if the remaining regressors X2i,X3i,Xki do not vary. In fact we already have performed estimation of the multiple regression model (6.2) using R in the previous section. The interpretation of the coefficient on student-teacher ratio is the effect on test scores of a one unit change student-teacher ratio if the percentage of English learners is kept constant.

Just like in the simple regression model, we assume the true relationship between Y and X1i,X2i,Xki to be linear. On average, this relation is given by the population regression function

(6.3)E(Yi|X1i=x1,X2i=x2,X3i=x3,,Xki=xk)=β0+β1x1+β2x2+β3x3++βkxk.

As in the simple regression model, the relation Yi=β0+β1X1i+β2X2i+β3X3i++βkXki does not hold exactly since there are disturbing influences to the dependent variable Y we cannot observe as explanatory variables. Therefore we add an error term u which represents deviations of the observations from the population regression line to (6.3). This yields the population multiple regression model (6.4)Yi=β0+β1X1i+β2X2i+β3X3i++βkXki+ui, i=1,,n.

Key Concept 6.2 summarizes the core concepts of the multiple regression model.

Key Concept 6.2

The Multiple Regression Model

The multiple regression model is

Yi=β0+β1X1i+β2X2i+β3X3i++βkXki+ui  ,  i=1,,n.

The designations are similar to those in the simple regression model:

  • Yi is the ith observation in the dependent variable. Observations on the k regressors are denoted by X1i,X2i,,Xki and ui is the error term.
  • The average relationship between Y and the regressors is given by the population regression line E(Yi|X1i=x1,X2i=x2,X3i=x3,,Xki=xk)=β0+β1x1+β2x2+β3x3++βkxk.
  • β0 is the intercept; it is the expected value of Y when all Xs equal 0. βj , j=1,,k are the coefficients on Xj , j=1,,k. β1 measures the expected change in Yi that results from a one unit change in X1i while holding all other regressors constant.

How can we estimate the coefficients of the multiple regression model (6.4)? We will not go too much into detail on this issue as our focus is on using R. However, it should be pointed out that, similarly to the simple regression model, the coefficients of the multiple regression model can be estimated using OLS. As in the simple model, we seek to minimize the sum of squared mistakes by choosing estimates b0,b1,,bk for the coefficients β0,β1,,βk such that

(6.5)i=1n(Yib0b1X1ib2X2ibkXki)2

is minimized. Note that (6.5) is simply an extension of SSR in the case with just one regressor and a constant. The estimators that minimize (6.5) are hence denoted β^0,β^1,,β^k and, as in the simple regression model, we call them the ordinary least squares estimators of β0,β1,,βk. For the predicted value of Yi given the regressors and the estimates β^0,β^1,,β^k we have

Y^i=β^0+β^1X1i++β^kXki. The difference between Yi and its predicted value Y^i is called the OLS residual of observation i: u^=YiY^i.

For further information regarding the theory behind multiple regression, see Chapter 18.1 of the book which inter alia presents a derivation of the OLS estimator in the multiple regression model using matrix notation.

Now let us jump back to the example of test scores and class sizes. The estimated model object is mult.mod. As for simple regression models we can use summary() to obtain information on estimated coefficients and model statistics.

summary(mult.mod)$coef
##                Estimate Std. Error    t value      Pr(>|t|)
## (Intercept) 686.0322445 7.41131160  92.565565 3.871327e-280
## STR          -1.1012956 0.38027827  -2.896026  3.978059e-03
## english      -0.6497768 0.03934254 -16.515882  1.657448e-47

So the estimated multiple regression model is

(6.6)TestScore^=686.031.10×STR0.65×PctEL.

Unlike in the simple regression model where the data can be represented by points in the two-dimensional coordinate system, we now have three dimensions. Hence observations can be represented by points in three-dimensional space. Therefore (6.6) is now longer a regression line but a regression plane. This idea extends to higher dimensions when we further expand the number of regressors k. We then say that the regression model can be represented by a hyperplane in the k+1 dimensional space. It is already hard to imagine such a space if k=3 and we best stick with the general idea that, in the multiple regression model, the dependent variable is explained by a linear combination of the regressors. However, in the present case we are able to visualize the situation. The following figure is an interactive 3D visualization of the data and the estimated regression plane (6.6).

We observe that the estimated regression plane fits the data reasonably well — at least with regard to the shape and spatial position of the points. The color of the markers is an indicator for the absolute deviation from the predicted regression plane. Observations that are colored more reddish lie close to the regression plane while the color shifts to blue with growing distance. An anomaly that can be seen from the plot is that there might be heteroskedasticity: we see that the dispersion of regression errors made, i.e., the distance of observations to the regression plane tends to decrease as the share of English learning students increases.