8.2 Nonlinear Functions of a Single Independent Variable
Polynomials
The approach used to obtain a quadratic model can be generalized to polynomial models of arbitrary degree rr, Yi=β0+β1Xi+β2X2i+⋯+βrXri+ui.Yi=β0+β1Xi+β2X2i+⋯+βrXri+ui.
A cubic model for instance can be estimated in the same way as the quadratic model; we just have to use a polynomial of degree r=3r=3 in income. This is conveniently done using the function poly().
# estimate a cubic model
cubic_model <- lm(score ~ poly(income, degree = 3, raw = TRUE), data = CASchools)
poly() generates orthogonal polynomials which are orthogonal to the constant by default. Here, we set raw = TRUE such that raw polynomials are evaluated, see ?poly
.
In practice the question will arise which polynomial order should be chosen. First, similarly as for r=2r=2, we can test the null hypothesis that the true relation is linear against the alternative hypothesis that the relationship is a polynomial of degree rr:
H0:β2=0, β3=0,…,βr=0 vs. H1:at least one βj≠0, j=2,…,rH0:β2=0, β3=0,…,βr=0 vs. H1:at least one βj≠0, j=2,…,r
This is a joint null hypothesis with r−1r−1 restrictions so it can be tested using the FF-test presented in Chapter 7. linearHypothesis() can be used to conduct such tests. For example, we may test the null of a linear model against the alternative of a polynomial of a maximal degree r=3r=3 as follows.
# test the hypothesis of a linear model against quadratic or polynomial
# alternatives
# set up hypothesis matrix
R <- rbind(c(0, 0, 1, 0),
c(0, 0, 0, 1))
# do the test
linearHypothesis(cubic_model,
hypothesis.matrix = R,
white.adj = "hc1")
## Linear hypothesis test
##
## Hypothesis:
## poly(income, degree = 3, raw = TRUE)2 = 0
## poly(income, degree = 3, raw = TRUE)3 = 0
##
## Model 1: restricted model
## Model 2: score ~ poly(income, degree = 3, raw = TRUE)
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 418
## 2 416 2 37.691 9.043e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We provide a hypothesis matrix as the argument hypothesis.matrix. This is useful when the coefficients have long names, as is the case here due to using poly(), or when the restrictions include multiple coefficients. How the hypothesis matrix RR is interpreted by linearHypothesis() is best seen using matrix algebra:
For the two linear constrains above, we have Rβ=s(00100001)(β0β1β2β3)=(00)(β2β3)=(00).linearHypothesis() uses the zero vector for s by default, see ?linearHypothesis
.
The p-value for is very small so that we reject the null hypothesis. However, this does not tell us which r to choose. In practice, one approach to determine the degree of the polynomial is to use sequential testing:
- Estimate a polynomial model for some maximum value r.
- Use a t-test to test βr=0. Rejection of the null means that Xr belongs in the regression equation.
- Acceptance of the null in step 2 means that Xr can be eliminated from the model. Continue by repeating step 1 with order r−1 and test whether βr−1=0. If the test rejects, use a polynomial model of order r−1.
- If the tests from step 3 rejects, continue with the procedure until the coefficient on the highest power is statistically significant.
There is no unambiguous guideline how to choose r in step one. However, as pointed out in J. Stock & Watson (2015), economic data is often smooth such that it is appropriate to choose small orders like 2, 3, or 4.
We will demonstrate how to apply sequential testing by the example of the cubic model.
summary(cubic_model)
##
## Call:
## lm(formula = score ~ poly(income, degree = 3, raw = TRUE), data = CASchools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.28 -9.21 0.20 8.32 31.16
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 6.001e+02 5.830e+00 102.937
## poly(income, degree = 3, raw = TRUE)1 5.019e+00 8.595e-01 5.839
## poly(income, degree = 3, raw = TRUE)2 -9.581e-02 3.736e-02 -2.564
## poly(income, degree = 3, raw = TRUE)3 6.855e-04 4.720e-04 1.452
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## poly(income, degree = 3, raw = TRUE)1 1.06e-08 ***
## poly(income, degree = 3, raw = TRUE)2 0.0107 *
## poly(income, degree = 3, raw = TRUE)3 0.1471
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.71 on 416 degrees of freedom
## Multiple R-squared: 0.5584, Adjusted R-squared: 0.5552
## F-statistic: 175.4 on 3 and 416 DF, p-value: < 2.2e-16
The estimated cubic model stored in cubic_model is
^TestScorei=600.1(5.83)+5.02(0.86)×income−0.96(0.03)×income2−0.00069(0.00047)×income3.
The t-statistic on income3 is 1.42 so the null that the relationship is quadratic cannot be rejected, even at the 10% level. This is contrary to the result presented book which reports robust standard errors throughout so we will also use robust variance-covariance estimation to reproduce these results.
# test the hypothesis using robust standard errors
coeftest(cubic_model, vcov. = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value
## (Intercept) 6.0008e+02 5.1021e+00 117.6150
## poly(income, degree = 3, raw = TRUE)1 5.0187e+00 7.0735e-01 7.0950
## poly(income, degree = 3, raw = TRUE)2 -9.5805e-02 2.8954e-02 -3.3089
## poly(income, degree = 3, raw = TRUE)3 6.8549e-04 3.4706e-04 1.9751
## Pr(>|t|)
## (Intercept) < 2.2e-16 ***
## poly(income, degree = 3, raw = TRUE)1 5.606e-12 ***
## poly(income, degree = 3, raw = TRUE)2 0.001018 **
## poly(income, degree = 3, raw = TRUE)3 0.048918 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The reported standard errors have changed. Furthermore, the coefficient for income^3
is now significant at the 5% level. This means we reject the hypothesis that the regression function is quadratic against the alternative that it is cubic. Furthermore, we can also test if the coefficients for income^2 and income^3 are jointly significant using a robust version of the F-test.
# perform robust F-test
linearHypothesis(cubic_model,
hypothesis.matrix = R,
vcov. = vcovHC, type = "HC1")
## Linear hypothesis test
##
## Hypothesis:
## poly(income, degree = 3, raw = TRUE)2 = 0
## poly(income, degree = 3, raw = TRUE)3 = 0
##
## Model 1: restricted model
## Model 2: score ~ poly(income, degree = 3, raw = TRUE)
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 418
## 2 416 2 29.678 8.945e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a p-value of 9.043e−16, i.e., much less than 0.05, the null hypothesis of linearity is rejected in favor of the alternative that the relationship is quadratic or cubic.
Interpretation of Coefficients in Nonlinear Regression Models
The coefficients in polynomial regression do not have a simple interpretation. Why? Think of a quadratic model: it is not helpful to think of the coefficient on X as the expected change in Y associated with a change in X holding the other regressors constant because X2 changes as X varies. This is also the case for other deviations from linearity, for example in models where regressors and/or the dependent variable are log-transformed. A way to approach this is to calculate the estimated effect on Y associated with a change in X for one or more values of X. This idea is summarized in Key Concept 8.1.
Key Concept 8.1
The Expected Effect on Y of a Change in X1 in a Nonlinear Regression Model
Consider the nonlinear population regression model
Yi=f(X1i,X2i,…,Xki)+ui , i=1,…,n,
where f(X1i,X2i,…,Xki) is the population regression function and ui is the error term.
Denote by ΔY the expected change in Y associated with ΔX1, the change in X1 while holding X2,⋯,Xk constant. That is, the expected change in Y is the difference
ΔY=f(X1+ΔX1,X2,⋯,Xk)−f(X1,X2,⋯,Xk).
The estimator of this unknown population difference is the difference between the predicted values for these two cases. Let ˆf(X1,X2,⋯,Xk) be the predicted value of of Y based on the estimator ˆf of the population regression function. Then the predicted change in Y is
ΔˆY=ˆf(X1+ΔX1,X2,⋯,Xk)−ˆf(X1,X2,⋯,Xk).For example, we may ask the following: what is the predicted change in test scores associated with a one unit change (i.e., $1000) in income, based on the estimated quadratic regression function
^TestScore=607.3+3.85×income−0.0423×income2 ?
Because the regression function is quadratic, this effect depends on the initial district income. We therefore consider two cases:
An increase in district income form 10 to 11 (from $10000 per capita to $11000).
An increase in district income from 40 to 41 (that is from $40000 to $41000).
In order to obtain the ΔˆY associated with a change in income form 10 to 11, we use the following formula:
ΔˆY=(ˆβ0+ˆβ1×11+ˆβ2×112)−(ˆβ0+ˆβ1×10+ˆβ2×102) To compute ˆY using R we may use predict().
# compute and assign the quadratic model
quadriatic_model <- lm(score ~ income + I(income^2), data = CASchools)
# set up data for prediction
new_data <- data.frame(income = c(10, 11))
# do the prediction
Y_hat <- predict(quadriatic_model, newdata = new_data)
# compute the difference
diff(Y_hat)
## 2
## 2.962517
Analogously we can compute the effect of a change in district income from 40 to 41:
# set up data for prediction
new_data <- data.frame(income = c(40, 41))
# do the prediction
Y_hat <- predict(quadriatic_model, newdata = new_data)
# compute the difference
diff(Y_hat)
## 2
## 0.4240097
So for the quadratic model, the expected change in TestScore induced by an increase in income from 10 to 11 is about 2.96 points but an increase in income from 40 to 41 increases the predicted score by only 0.42. Hence, the slope of the estimated quadratic regression function is steeper at low levels of income than at higher levels.
Logarithms
Another way to specify a nonlinear regression function is to use the natural logarithm of Y and/or X. Logarithms convert changes in variables into percentage changes. This is convenient as many relationships are naturally expressed in terms of percentages.
There are three different cases in which logarithms might be used.
Transform X with its logarithm, but not Y.
Analogously we could transform Y to its logarithm but leave X at level.
Both Y and X are transformed to their logarithms.
The interpretation of the regression coefficients is different in each case.
Case I: X is in Logarithm, Y is not.
The regression model then is Yi=β0+β1×ln(Xi)+ui, i=1,...,n. Similar as for polynomial regression we do not have to create a new variable before using lm(). We can simply adjust the formula argument of lm() to tell R that the log-transformation of a variable should be used.
# estimate a level-log model
LinearLog_model <- lm(score ~ log(income), data = CASchools)
# compute robust summary
coeftest(LinearLog_model,
vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 557.8323 3.8399 145.271 < 2.2e-16 ***
## log(income) 36.4197 1.3969 26.071 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hence, the estimated regression function is
^TestScore=557.8+36.42×ln(income).
Let us draw a plot of this function.
# draw a scatterplot
plot(score ~ income,
col = "steelblue",
pch = 20,
data = CASchools,
main = "Linear-Log Regression Line")
# add the linear-log regression line
order_id <- order(CASchools$income)
lines(CASchools$income[order_id],
fitted(LinearLog_model)[order_id],
col = "red",
lwd = 2)
Hide Source
Hide Plot
We can interpret ˆβ1 as follows: a 1% increase in income is associated with an increase in test scores of 0.01×36.42=0.36 points. In order to get the estimated effect of a one unit change in income (that is, a change in the original units, thousands of dollars) on test scores, the method presented in Key Concept 8.1 can be used.
# set up new data
new_data <- data.frame(income = c(10, 11, 40, 41))
# predict the outcomes
Y_hat <- predict(LinearLog_model, newdata = new_data)
# compute the expected difference
Y_hat_matrix <- matrix(Y_hat, nrow = 2, byrow = TRUE)
Y_hat_matrix[, 2] - Y_hat_matrix[, 1]
## [1] 3.471166 0.899297
By setting nrow = 2 and byrow = TRUE in matrix() we ensure that Y_hat_matrix is a 2×2 matrix filled row-wise with the entries of Y_hat.
The estimated model states that for an income increase from $10000 to $11000, test scores increase by an expected amount of 3.47 points. When income increases from $40000 to $41000, the expected increase in test scores is only about 0.90 points.
Case II: Y is in Logarithm, X is not
There are cases where it is useful to regress ln(Y).
The corresponding regression model then is
ln(Yi)=β0+β1×Xi+ui, i=1,...,n.
# estimate a log-linear model
LogLinear_model <- lm(log(score) ~ income, data = CASchools)
# obtain a robust coefficient summary
coeftest(LogLinear_model,
vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.43936234 0.00289382 2225.210 < 2.2e-16 ***
## income 0.00284407 0.00017509 16.244 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated regression function is ^ln(TestScore)=6.439+0.00284×income. An increase in district income by $1000 is expected to increase test scores by 100×0.00284%=0.284%.
When the dependent variable in logarithm, one cannot simply use elog(⋅) to transform predictions back to the original scale, see page of the book.
Case III: X and Y are in Logarithms
The log-log regression model is ln(Yi)=β0+β1×ln(Xi)+ui, i=1,...,n.
# estimate the log-log model
LogLog_model <- lm(log(score) ~ log(income), data = CASchools)
# print robust coefficient summary to the console
coeftest(LogLog_model,
vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.3363494 0.0059246 1069.501 < 2.2e-16 ***
## log(income) 0.0554190 0.0021446 25.841 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated regression function hence is ^ln(TestScore)=6.336+0.0554×ln(income). In a log-log model, a 1% change in X is associated with an estimated ˆβ1% change in Y.
We now reproduce Figure 8.5 of the book.
# generate a scatterplot
plot(log(score) ~ income,
col = "steelblue",
pch = 20,
data = CASchools,
main = "Log-Linear Regression Function")
# add the log-linear regression line
order_id <- order(CASchools$income)
lines(CASchools$income[order_id],
fitted(LogLinear_model)[order_id],
col = "red",
lwd = 2)
# add the log-log regression line
lines(sort(CASchools$income),
fitted(LogLog_model)[order(CASchools$income)],
col = "green",
lwd = 2)
# add a legend
legend("bottomright",
legend = c("log-linear model", "log-log model"),
lwd = 2,
col = c("red", "green"))
Hide Source
Hide Plot
Key Concept 8.2 summarizes the three logarithmic regression models.
Key Concept 8.2
Logarithms in Regression: Three Cases
Logarithms can be used to transform the dependent variable Y or the independent variable X, or both (the variable being transformed must be positive). The following table summarizes these three cases and the interpretation of the regression coefficient β1. In each case, β1, can be estimated by applying OLS after taking the logarithm(s) of the dependent and/or the independent variable.
Case | Model Specification | Interpretation of β1 |
---|---|---|
(I) | Yi=β0+β1ln(Xi)+ui | A 1% change in X is associated with a change in Y of 0.01×β1. |
(II) | ln(Yi)=β0+β1Xi+ui | A change in X by one unit (ΔX=1) is associated with a 100×β1% change in Y. |
(III) | ln(Yi)=β0+β1ln(Xi)+ui | A 1% change in X is associated with a β1% change in Y, so β1 is the elasticity of Y with respect to X. |
Of course we can also estimate a polylog model like
TestScorei=β0+β1×ln(incomei)+β2×ln(incomei)2+β3×ln(incomei)3+ui
which models the dependent variable TestScore by a third-degree polynomial of the log-transformed regressor income.
# estimate the polylog model
polyLog_model <- lm(score ~ log(income) + I(log(income)^2) + I(log(income)^3),
data = CASchools)
# print robust summary to the console
coeftest(polyLog_model,
vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 486.1341 79.3825 6.1239 2.115e-09 ***
## log(income) 113.3820 87.8837 1.2901 0.1977
## I(log(income)^2) -26.9111 31.7457 -0.8477 0.3971
## I(log(income)^3) 3.0632 3.7369 0.8197 0.4128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparing by ˉR2 we find that, leaving out the log-linear model, all models have a similar adjusted fit. In the class of polynomial models, the cubic specification has the highest ˉR2 whereas the linear-log specification is the best of the log-models.
# compute the adj. R^2 for the nonlinear models
adj_R2 <-rbind("quadratic" = summary(quadratic_model)$adj.r.squared,
"cubic" = summary(cubic_model)$adj.r.squared,
"LinearLog" = summary(LinearLog_model)$adj.r.squared,
"LogLinear" = summary(LogLinear_model)$adj.r.squared,
"LogLog" = summary(LogLog_model)$adj.r.squared,
"polyLog" = summary(polyLog_model)$adj.r.squared)
# assign column names
colnames(adj_R2) <- "adj_R2"
adj_R2
## adj_R2
## quadratic 0.5540444
## cubic 0.5552279
## LinearLog 0.5614605
## LogLinear 0.4970106
## LogLog 0.5567251
## polyLog 0.5599944
Let us now compare the cubic and the linear-log model by plotting the corresponding estimated regression functions.
# generate a scatterplot
plot(score ~ income,
data = CASchools,
col = "steelblue",
pch = 20,
main = "Linear-Log and Cubic Regression Functions")
# add the linear-log regression line
order_id <- order(CASchools$income)
lines(CASchools$income[order_id],
fitted(LinearLog_model)[order_id],
col = "darkgreen",
lwd = 2)
# add the cubic regression line
lines(x = CASchools$income[order_id],
y = fitted(cubic_model)[order_id],
col = "darkred",
lwd = 2)
Hide Source
Hide Plot
Both regression lines look nearly identical. Altogether the linear-log model may be preferable since it is more parsimonious in terms of regressors: it does not include higher-degree polynomials.
References
Stock, J., & Watson, M. (2015). Introduction to Econometrics, Third Update, Global Edition. Pearson Education Limited.