14.1 Using Regression Models for Forecasting
What is the difference between estimating models for assessment of causal effects and forecasting? Consider again the simple example of estimating the casual effect of the student-teacher ratio on test scores introduced in Chapter 4.
library(AER)
data(CASchools)
CASchools$STR <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math)/2
mod <- lm(score ~ STR, data = CASchools)
mod
##
## Call:
## lm(formula = score ~ STR, data = CASchools)
##
## Coefficients:
## (Intercept) STR
## 698.93 -2.28
As has been stressed in Chapter 6, the estimate of the coefficient on the student-teacher ratio does not have a causal interpretation due to omitted variable bias. However, in terms of deciding which school to send her child to, it might nevertheless be appealing for a parent to use mod for forecasting test scores in schooling districts where no public data about on scores are available.
As an example, assume that the average class in a district has \(25\) students. This is not a perfect forecast but the following one-liner might be helpful for the parent to decide.
predict(mod, newdata = data.frame("STR" = 25))
## 1
## 641.9377
In a time series context, the parent could use data on present and past years test scores to forecast next year’s test scores — a typical application for an autoregressive model.