This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

8.2 Nonlinear Functions of a Single Independent Variable

Polynomials

The approach used to obtain a quadratic model can be generalized to polynomial models of arbitrary degree rr, Yi=β0+β1Xi+β2X2i++βrXri+ui.Yi=β0+β1Xi+β2X2i++βrXri+ui.

A cubic model for instance can be estimated in the same way as the quadratic model; we just have to use a polynomial of degree r=3r=3 in income. This is conveniently done using the function poly().

# estimate a cubic model
cubic_model <- lm(score ~ poly(income, degree = 3, raw = TRUE), data = CASchools)

poly() generates orthogonal polynomials which are orthogonal to the constant by default. Here, we set raw = TRUE such that raw polynomials are evaluated, see ?poly.

In practice the question will arise which polynomial order should be chosen. First, similarly as for r=2r=2, we can test the null hypothesis that the true relation is linear against the alternative hypothesis that the relationship is a polynomial of degree rr:

H0:β2=0, β3=0,,βr=0   vs.   H1:at least one βj0, j=2,,rH0:β2=0, β3=0,,βr=0   vs.   H1:at least one βj0, j=2,,r

This is a joint null hypothesis with r1r1 restrictions so it can be tested using the FF-test presented in Chapter 7. linearHypothesis() can be used to conduct such tests. For example, we may test the null of a linear model against the alternative of a polynomial of a maximal degree r=3r=3 as follows.

# test the hypothesis of a linear model against quadratic or polynomial
# alternatives

# set up hypothesis matrix
R <- rbind(c(0, 0, 1, 0),
            c(0, 0, 0, 1))

# do the test
linearHypothesis(cubic_model,
                 hypothesis.matrix = R,
                 white.adj = "hc1")
## Linear hypothesis test
## 
## Hypothesis:
## poly(income, degree = 3, raw = TRUE)2 = 0
## poly(income, degree = 3, raw = TRUE)3 = 0
## 
## Model 1: restricted model
## Model 2: score ~ poly(income, degree = 3, raw = TRUE)
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1    418                        
## 2    416  2 37.691 9.043e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We provide a hypothesis matrix as the argument hypothesis.matrix. This is useful when the coefficients have long names, as is the case here due to using poly(), or when the restrictions include multiple coefficients. How the hypothesis matrix RR is interpreted by linearHypothesis() is best seen using matrix algebra:

For the two linear constrains above, we have Rβ=s(00100001)(β0β1β2β3)=(00)(β2β3)=(00).

linearHypothesis() uses the zero vector for s by default, see ?linearHypothesis.

The p-value for is very small so that we reject the null hypothesis. However, this does not tell us which r to choose. In practice, one approach to determine the degree of the polynomial is to use sequential testing:

  1. Estimate a polynomial model for some maximum value r.
  2. Use a t-test to test βr=0. Rejection of the null means that Xr belongs in the regression equation.
  3. Acceptance of the null in step 2 means that Xr can be eliminated from the model. Continue by repeating step 1 with order r1 and test whether βr1=0. If the test rejects, use a polynomial model of order r1.
  4. If the tests from step 3 rejects, continue with the procedure until the coefficient on the highest power is statistically significant.

There is no unambiguous guideline how to choose r in step one. However, as pointed out in J. Stock & Watson (2015), economic data is often smooth such that it is appropriate to choose small orders like 2, 3, or 4.

We will demonstrate how to apply sequential testing by the example of the cubic model.

summary(cubic_model)
## 
## Call:
## lm(formula = score ~ poly(income, degree = 3, raw = TRUE), data = CASchools)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -44.28  -9.21   0.20   8.32  31.16 
## 
## Coefficients:
##                                         Estimate Std. Error t value
## (Intercept)                            6.001e+02  5.830e+00 102.937
## poly(income, degree = 3, raw = TRUE)1  5.019e+00  8.595e-01   5.839
## poly(income, degree = 3, raw = TRUE)2 -9.581e-02  3.736e-02  -2.564
## poly(income, degree = 3, raw = TRUE)3  6.855e-04  4.720e-04   1.452
##                                       Pr(>|t|)    
## (Intercept)                            < 2e-16 ***
## poly(income, degree = 3, raw = TRUE)1 1.06e-08 ***
## poly(income, degree = 3, raw = TRUE)2   0.0107 *  
## poly(income, degree = 3, raw = TRUE)3   0.1471    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.71 on 416 degrees of freedom
## Multiple R-squared:  0.5584, Adjusted R-squared:  0.5552 
## F-statistic: 175.4 on 3 and 416 DF,  p-value: < 2.2e-16

The estimated cubic model stored in cubic_model is

^TestScorei=600.1(5.83)+5.02(0.86)×income0.96(0.03)×income20.00069(0.00047)×income3.

The t-statistic on income3 is 1.42 so the null that the relationship is quadratic cannot be rejected, even at the 10% level. This is contrary to the result presented book which reports robust standard errors throughout so we will also use robust variance-covariance estimation to reproduce these results.

# test the hypothesis using robust standard errors
coeftest(cubic_model, vcov. = vcovHC, type = "HC1")
## 
## t test of coefficients:
## 
##                                          Estimate  Std. Error  t value
## (Intercept)                            6.0008e+02  5.1021e+00 117.6150
## poly(income, degree = 3, raw = TRUE)1  5.0187e+00  7.0735e-01   7.0950
## poly(income, degree = 3, raw = TRUE)2 -9.5805e-02  2.8954e-02  -3.3089
## poly(income, degree = 3, raw = TRUE)3  6.8549e-04  3.4706e-04   1.9751
##                                        Pr(>|t|)    
## (Intercept)                           < 2.2e-16 ***
## poly(income, degree = 3, raw = TRUE)1 5.606e-12 ***
## poly(income, degree = 3, raw = TRUE)2  0.001018 ** 
## poly(income, degree = 3, raw = TRUE)3  0.048918 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The reported standard errors have changed. Furthermore, the coefficient for income^3 is now significant at the 5% level. This means we reject the hypothesis that the regression function is quadratic against the alternative that it is cubic. Furthermore, we can also test if the coefficients for income^2 and income^3 are jointly significant using a robust version of the F-test.

# perform robust F-test 
linearHypothesis(cubic_model, 
                 hypothesis.matrix = R,
                 vcov. = vcovHC, type = "HC1")
## Linear hypothesis test
## 
## Hypothesis:
## poly(income, degree = 3, raw = TRUE)2 = 0
## poly(income, degree = 3, raw = TRUE)3 = 0
## 
## Model 1: restricted model
## Model 2: score ~ poly(income, degree = 3, raw = TRUE)
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1    418                        
## 2    416  2 29.678 8.945e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 9.043e16, i.e., much less than 0.05, the null hypothesis of linearity is rejected in favor of the alternative that the relationship is quadratic or cubic.

Interpretation of Coefficients in Nonlinear Regression Models

The coefficients in polynomial regression do not have a simple interpretation. Why? Think of a quadratic model: it is not helpful to think of the coefficient on X as the expected change in Y associated with a change in X holding the other regressors constant because X2 changes as X varies. This is also the case for other deviations from linearity, for example in models where regressors and/or the dependent variable are log-transformed. A way to approach this is to calculate the estimated effect on Y associated with a change in X for one or more values of X. This idea is summarized in Key Concept 8.1.

Key Concept 8.1

The Expected Effect on Y of a Change in X1 in a Nonlinear Regression Model

Consider the nonlinear population regression model

Yi=f(X1i,X2i,,Xki)+ui , i=1,,n,

where f(X1i,X2i,,Xki) is the population regression function and ui is the error term.

Denote by ΔY the expected change in Y associated with ΔX1, the change in X1 while holding X2,,Xk constant. That is, the expected change in Y is the difference

ΔY=f(X1+ΔX1,X2,,Xk)f(X1,X2,,Xk).

The estimator of this unknown population difference is the difference between the predicted values for these two cases. Let ˆf(X1,X2,,Xk) be the predicted value of of Y based on the estimator ˆf of the population regression function. Then the predicted change in Y is

ΔˆY=ˆf(X1+ΔX1,X2,,Xk)ˆf(X1,X2,,Xk).

For example, we may ask the following: what is the predicted change in test scores associated with a one unit change (i.e., $1000) in income, based on the estimated quadratic regression function

^TestScore=607.3+3.85×income0.0423×income2 ?

Because the regression function is quadratic, this effect depends on the initial district income. We therefore consider two cases:

  1. An increase in district income form 10 to 11 (from $10000 per capita to $11000).

  2. An increase in district income from 40 to 41 (that is from $40000 to $41000).

In order to obtain the ΔˆY associated with a change in income form 10 to 11, we use the following formula:

ΔˆY=(ˆβ0+ˆβ1×11+ˆβ2×112)(ˆβ0+ˆβ1×10+ˆβ2×102) To compute ˆY using R we may use predict().

# compute and assign the quadratic model
quadriatic_model <- lm(score ~ income + I(income^2), data = CASchools)

# set up data for prediction
new_data <- data.frame(income = c(10, 11))

# do the prediction
Y_hat <- predict(quadriatic_model, newdata = new_data)

# compute the difference
diff(Y_hat)
##        2 
## 2.962517

Analogously we can compute the effect of a change in district income from 40 to 41:

# set up data for prediction
new_data <- data.frame(income = c(40, 41))

# do the prediction
Y_hat <- predict(quadriatic_model, newdata = new_data)

# compute the difference
diff(Y_hat)
##         2 
## 0.4240097

So for the quadratic model, the expected change in TestScore induced by an increase in income from 10 to 11 is about 2.96 points but an increase in income from 40 to 41 increases the predicted score by only 0.42. Hence, the slope of the estimated quadratic regression function is steeper at low levels of income than at higher levels.

Logarithms

Another way to specify a nonlinear regression function is to use the natural logarithm of Y and/or X. Logarithms convert changes in variables into percentage changes. This is convenient as many relationships are naturally expressed in terms of percentages.

There are three different cases in which logarithms might be used.

  1. Transform X with its logarithm, but not Y.

  2. Analogously we could transform Y to its logarithm but leave X at level.

  3. Both Y and X are transformed to their logarithms.

The interpretation of the regression coefficients is different in each case.

Case I: X is in Logarithm, Y is not.

The regression model then is Yi=β0+β1×ln(Xi)+uii=1,...,n. Similar as for polynomial regression we do not have to create a new variable before using lm(). We can simply adjust the formula argument of lm() to tell R that the log-transformation of a variable should be used.

# estimate a level-log model
LinearLog_model <- lm(score ~ log(income), data = CASchools)

# compute robust summary
coeftest(LinearLog_model, 
         vcov = vcovHC, type = "HC1")
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 557.8323     3.8399 145.271 < 2.2e-16 ***
## log(income)  36.4197     1.3969  26.071 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hence, the estimated regression function is

^TestScore=557.8+36.42×ln(income).

Let us draw a plot of this function.

# draw a scatterplot
plot(score ~ income, 
     col = "steelblue",
     pch = 20,
     data = CASchools,
     main = "Linear-Log Regression Line")

# add the linear-log regression line
order_id  <- order(CASchools$income)

lines(CASchools$income[order_id],
      fitted(LinearLog_model)[order_id], 
      col = "red", 
      lwd = 2)Hide Source
Hide Plot

We can interpret ˆβ1 as follows: a 1% increase in income is associated with an increase in test scores of 0.01×36.42=0.36 points. In order to get the estimated effect of a one unit change in income (that is, a change in the original units, thousands of dollars) on test scores, the method presented in Key Concept 8.1 can be used.

# set up new data
new_data <- data.frame(income = c(10, 11, 40, 41))

# predict the outcomes 
Y_hat <- predict(LinearLog_model, newdata = new_data)

# compute the expected difference
Y_hat_matrix <- matrix(Y_hat, nrow = 2, byrow = TRUE)
Y_hat_matrix[, 2] - Y_hat_matrix[, 1]
## [1] 3.471166 0.899297

By setting nrow = 2 and byrow = TRUE in matrix() we ensure that Y_hat_matrix is a 2×2 matrix filled row-wise with the entries of Y_hat.

The estimated model states that for an income increase from $10000 to $11000, test scores increase by an expected amount of 3.47 points. When income increases from $40000 to $41000, the expected increase in test scores is only about 0.90 points.

Case II: Y is in Logarithm, X is not

There are cases where it is useful to regress ln(Y).

The corresponding regression model then is

ln(Yi)=β0+β1×Xi+ui,  i=1,...,n.

# estimate a log-linear model 
LogLinear_model <- lm(log(score) ~ income, data = CASchools)

# obtain a robust coefficient summary
coeftest(LogLinear_model, 
         vcov = vcovHC, type = "HC1")
## 
## t test of coefficients:
## 
##               Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept) 6.43936234 0.00289382 2225.210 < 2.2e-16 ***
## income      0.00284407 0.00017509   16.244 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated regression function is ^ln(TestScore)=6.439+0.00284×income. An increase in district income by $1000 is expected to increase test scores by 100×0.00284%=0.284%.

When the dependent variable in logarithm, one cannot simply use elog() to transform predictions back to the original scale, see page of the book.

Case III: X and Y are in Logarithms

The log-log regression model is ln(Yi)=β0+β1×ln(Xi)+ui,  i=1,...,n.

# estimate the log-log model
LogLog_model <- lm(log(score) ~ log(income), data = CASchools)

# print robust coefficient summary to the console
coeftest(LogLog_model, 
         vcov = vcovHC, type = "HC1")
## 
## t test of coefficients:
## 
##              Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept) 6.3363494  0.0059246 1069.501 < 2.2e-16 ***
## log(income) 0.0554190  0.0021446   25.841 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated regression function hence is ^ln(TestScore)=6.336+0.0554×ln(income). In a log-log model, a 1% change in X is associated with an estimated ˆβ1% change in Y.

We now reproduce Figure 8.5 of the book.

# generate a scatterplot
plot(log(score) ~ income, 
     col = "steelblue", 
     pch = 20, 
     data = CASchools,
     main = "Log-Linear Regression Function")

# add the log-linear regression line
order_id  <- order(CASchools$income)

lines(CASchools$income[order_id], 
      fitted(LogLinear_model)[order_id], 
      col = "red", 
      lwd = 2)

# add the log-log regression line
lines(sort(CASchools$income), 
      fitted(LogLog_model)[order(CASchools$income)], 
      col = "green", 
      lwd = 2)

# add a legend
legend("bottomright",
       legend = c("log-linear model", "log-log model"),
       lwd = 2,
       col = c("red", "green"))Hide Source
Hide Plot

Key Concept 8.2 summarizes the three logarithmic regression models.

Key Concept 8.2

Logarithms in Regression: Three Cases

Logarithms can be used to transform the dependent variable Y or the independent variable X, or both (the variable being transformed must be positive). The following table summarizes these three cases and the interpretation of the regression coefficient β1. In each case, β1, can be estimated by applying OLS after taking the logarithm(s) of the dependent and/or the independent variable.

Case Model Specification Interpretation of β1
(I) Yi=β0+β1ln(Xi)+ui A 1% change in X is associated with a change in Y of 0.01×β1.
(II) ln(Yi)=β0+β1Xi+ui A change in X by one unit (ΔX=1) is associated with a 100×β1% change in Y.
(III) ln(Yi)=β0+β1ln(Xi)+ui A 1% change in X is associated with a β1% change in Y, so β1 is the elasticity of Y with respect to X.

Of course we can also estimate a polylog model like

TestScorei=β0+β1×ln(incomei)+β2×ln(incomei)2+β3×ln(incomei)3+ui

which models the dependent variable TestScore by a third-degree polynomial of the log-transformed regressor income.

# estimate the polylog model
polyLog_model <- lm(score ~ log(income) + I(log(income)^2) + I(log(income)^3), 
                    data = CASchools)

# print robust summary to the console
coeftest(polyLog_model, 
         vcov = vcovHC, type = "HC1")
## 
## t test of coefficients:
## 
##                  Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)      486.1341    79.3825  6.1239 2.115e-09 ***
## log(income)      113.3820    87.8837  1.2901    0.1977    
## I(log(income)^2) -26.9111    31.7457 -0.8477    0.3971    
## I(log(income)^3)   3.0632     3.7369  0.8197    0.4128    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing by ˉR2 we find that, leaving out the log-linear model, all models have a similar adjusted fit. In the class of polynomial models, the cubic specification has the highest ˉR2 whereas the linear-log specification is the best of the log-models.

# compute the adj. R^2 for the nonlinear models
adj_R2 <-rbind("quadratic" = summary(quadratic_model)$adj.r.squared,
               "cubic" = summary(cubic_model)$adj.r.squared,
               "LinearLog" = summary(LinearLog_model)$adj.r.squared,
               "LogLinear" = summary(LogLinear_model)$adj.r.squared,
               "LogLog" = summary(LogLog_model)$adj.r.squared,
               "polyLog" = summary(polyLog_model)$adj.r.squared)

# assign column names
colnames(adj_R2) <- "adj_R2"

adj_R2
##              adj_R2
## quadratic 0.5540444
## cubic     0.5552279
## LinearLog 0.5614605
## LogLinear 0.4970106
## LogLog    0.5567251
## polyLog   0.5599944

Let us now compare the cubic and the linear-log model by plotting the corresponding estimated regression functions.

# generate a scatterplot
plot(score ~ income, 
     data = CASchools,
     col = "steelblue", 
     pch = 20,
     main = "Linear-Log and Cubic Regression Functions")

# add the linear-log regression line
order_id  <- order(CASchools$income)

lines(CASchools$income[order_id],
      fitted(LinearLog_model)[order_id], 
      col = "darkgreen", 
      lwd = 2)

# add the cubic regression line
lines(x = CASchools$income[order_id], 
      y = fitted(cubic_model)[order_id],
      col = "darkred", 
      lwd = 2) Hide Source
Hide Plot

Both regression lines look nearly identical. Altogether the linear-log model may be preferable since it is more parsimonious in terms of regressors: it does not include higher-degree polynomials.

References

Stock, J., & Watson, M. (2015). Introduction to Econometrics, Third Update, Global Edition. Pearson Education Limited.