3.2 Model formulation and estimation by least squares
The multiple linear model extends the simple linear model by describing the relation between the random variables and . For example, in the last model for the wine
dataset, we had variables WinterRain
, AGST
, HarvestRain
and Age
and Price
. Therefore, as in Section 2.3, the multiple linear model is constructed by assuming that the linear relation
holds between the predictors and the response . In (3.1), is the intercept and are the slopes, respectively. is a random variable with mean zero and independent from . Another way of looking at (3.1) is
since .
The LHS of (3.2) is the conditional expectation of given . It represents how the mean of the random variable is changing according to particular values, denoted by , of the random variables . With the RHS, what we are saying is that the mean of is changing in a linear fashion with respect to the values of . Hence the interpretation of the coefficients:
- : is the mean of when .
- , : is the increment in mean of for an increment of one unit in , provided that the remaining variables do not change.
Figure 3.5 illustrates the geometrical interpretation of a multiple linear model: a plane in the -dimensional space. If , the plane is the regression line for simple linear regression. If , then the plane can be visualized in a three-dimensional plot
Figure 3.5: The least squares regression plane and its dependence on the kind of squared distance considered. Application also available here.
The estimation of is done as in simple linear regression, by minimizing the Residual Sum of Squares (RSS). First we need to introduce some helpful matrix notation. In the following, bold face are used for distinguishing vectors and matrices from scalars:
A sample of is , where denotes the -th observation of the -th predictor . We denote with to the -th observation of , so the sample simplifies to .
The design matrix contains all the information of the predictors and a column of ones
The vector of responses , the vector of coefficients and the vector of errors are, respectively22, Thanks to the matrix notation, we can turn the sample version of the multiple linear model, namely into something as compact as
The RSS for the multiple linear regression is The RSS aggregates the squared vertical distances from the data to a regression plane given by . Remember that the vertical distances are considered because we want to minimize the error in the prediction of . The least squares estimators are the minimizers of the RSS23: Luckily, thanks to the matrix form of (3.3), it is simple to compute a closed-form expression for the least squares estimates:
The data of the illustration has been generated with the following code:
# Generates 50 points from a N(0, 1): predictors and error
set.seed(34567) # Fixes the seed for the random generator
<- rnorm(50)
x1 <- rnorm(50)
x2 <- x1 + rnorm(50, sd = 0.05) # Make variables dependent
x3 <- rnorm(50)
eps
# Responses
<- -0.5 + 0.5 * x1 + 0.5 * x2 + eps
yLin <- -0.5 + x1^2 + 0.5 * x2 + eps
yQua <- -0.5 + 0.5 * exp(x2) + x3 + eps
yExp
# Data
<- data.frame(x1 = x1, x2 = x2, yLin = yLin,
leastSquares3D yQua = yQua, yExp = yExp)
Let’s check that indeed the coefficients given by lm
are the ones given by equation (3.4) for the regression yLin ~ x1 + x2
.
# Matrix X
<- cbind(1, x1, x2)
X
# Vector Y
<- yLin
Y
# Coefficients
<- solve(t(X) %*% X) %*% t(X) %*% Y
beta # %*% multiplies matrices
# solve() computes the inverse of a matrix
# t() transposes a matrix
beta## [,1]
## -0.5702694
## x1 0.4832624
## x2 0.3214894
# Output from lm
<- lm(yLin ~ x1 + x2, data = leastSquares3D)
mod $coefficients
mod## (Intercept) x1 x2
## -0.5702694 0.4832624 0.3214894
Compute for the regressions yLin ~ x1 + x2
, yQua ~ x1 + x2
and yExp ~ x2 + x3
using:
- equation (3.4) and
- the function
lm
.
Once we have the least squares estimates , we can define the next two concepts:
The fitted values , where They are the vertical projections of into the fitted line (see Figure 3.5). In a matrix form, inputting (3.3) where is called the hat matrix because it “puts the hat into .” What it does is to project into the regression plane (see Figure 3.5).
The residuals (or estimated errors) , where They are the vertical distances between actual data and fitted data.
We conclude with an insight on the relation of multiple and simple linear regressions. It is illustrated in Figure 3.6.
Consider the multiple linear model and its associated simple linear models and . Assume that we have a sample . Then, in general, , , and . This is, in general, the inclusion of a new predictor changes the coefficient estimates.

Figure 3.6: The regression plane (blue) and its relation with the simple linear regressions (green lines). The red points represent the sample for and the black points the subsamples for (bottom), (left) and (right).
The data employed in Figure 3.6 is:
set.seed(212542)
<- 100
n <- rnorm(n, sd = 2)
x1 <- rnorm(n, mean = x1, sd = 3)
x2 <- 1 + 2 * x1 - x2 + rnorm(n, sd = 1)
y <- data.frame(x1 = x1, x2 = x2, y = y) data
With the above data
, check how the fitted coefficients change for y ~ x1
, y ~ x2
and y ~ x1 + x2
.