This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

14.3 Autoregressions

Autoregressive models are heavily used in economic forecasting. An autoregressive model relates a time series variable to its past values. This section discusses the basic ideas of autoregressions models, shows how they are estimated and discusses an application to forecasting GDP growth using R.

The First-Order Autoregressive Model

It is intuitive that the immediate past of a variable should have power to predict its near future. The simplest autoregressive model uses only the most recent outcome of the time series observed to predict future values. For a time series YtYt such a model is called a first-order autoregressive model, often abbreviated AR(1), where the 1 indicates that the order of autoregression is one: Yt=β0+β1Yt1+ut

is the AR(1) population model of a time series Yt.

For the GDP growth series, an autoregressive model of order one uses only the information on GDP growth observed in the last quarter to predict a future growth rate. The first-order autoregression model of GDP growth can be estimated by computing OLS estimates in the regression of GDPGRt on GDPGRt1, ^GDPGRt=ˆβ0+ˆβ1GDPGRt1.

Following the book we use data from 1962 to 2012 to estimate (14.1). This is easily done with the function ar.ols() from the package stats.

# subset data
GDPGRSub <- GDPGrowth["1962::2012"]

# estimate the model
ar.ols(GDPGRSub, 
       order.max = 1, 
       demean = F, 
       intercept = T)
## 
## Call:
## ar.ols(x = GDPGRSub, order.max = 1, demean = F, intercept = T)
## 
## Coefficients:
##      1  
## 0.3384  
## 
## Intercept: 1.995 (0.2993) 
## 
## Order selected 1  sigma^2 estimated as  9.886

We can check that the computations done by ar.ols() are the same as done by lm().

# length of data set
N <-length(GDPGRSub)

GDPGR_level <- as.numeric(GDPGRSub[-1])
GDPGR_lags <- as.numeric(GDPGRSub[-N])

# estimate the model
armod <- lm(GDPGR_level ~ GDPGR_lags)
armod
## 
## Call:
## lm(formula = GDPGR_level ~ GDPGR_lags)
## 
## Coefficients:
## (Intercept)   GDPGR_lags  
##      1.9950       0.3384

As usual, we may use coeftest() to obtain a robust summary on the estimated regression coefficients.

# robust summary
coeftest(armod, vcov. = vcovHC, type = "HC1")
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 1.994986   0.351274  5.6793 4.691e-08 ***
## GDPGR_lags  0.338436   0.076188  4.4421 1.470e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Thus the estimated model is ^GDPGRt=1.995(0.351)+0.338(0.076)GDPGRt1.

We omit the first observation for GDPGR1962 Q1 from the vector of the dependent variable since GDPGR1962 Q11=GDPGR1961 Q4, is not included in the sample. Similarly, the last observation, GDPGR2012 Q4, is excluded from the predictor vector since the data does not include GDPGR2012 Q4+1=GDPGR2013 Q1. Put differently, when estimating the model, one observation is lost because of the time series structure of the data.

Forecasts and Forecast Errors

Suppose Yt follows an AR(1) model with an intercept and that you have an OLS estimate of the model on the basis of observations for T periods. Then you may use the AR(1) model to obtain ˆYT+1|T, a forecast for YT+1 using data up to period T where ˆYT+1|T=ˆβ0+ˆβ1YT. The forecast error is Forecast error=YT+1ˆYT+1|T.

Forecasts and Predicted Values

Forecasted values of Yt are not what we refer to as OLS predicted values of Yt. Also, the forecast error is not an OLS residual. Forecasts and forecast errors are obtained using out-of-sample values while predicted values and residuals are computed for in-sample values that were actually observed and used in estimating the model.

The root mean squared forecast error (RMSFE) measures the typical size of the forecast error and is defined as RMSFE=E[(YT+1ˆYT+1|T)2].

The RMSFE is composed of the future errors ut and the error made when estimating the coefficients. When the sample size is large, the former may be much larger than the latter so that RMSFEVar()ut which can be estimated by the standard error of the regression.

Application to GDP Growth

Using (14.2), the estimated AR(1) model of GDP growth, we perform the forecast for GDP growth for 2013:Q1 (remember that the model was estimated using data for periods 1962:Q1 - 2012:Q4, so 2013:Q1 is an out-of-sample period). Plugging GDPGR2012:Q40.15 into (14.2),

^GDPGR2013:Q1=1.995+0.3480.15=2.047.

The function forecast() from the forecast package has some useful features for forecasting time series data.

library(forecast)

# assign GDP growth rate in 2012:Q4
new <- data.frame("GDPGR_lags" = GDPGR_level[N-1])

# forecast GDP growth rate in 2013:Q1
forecast(armod, newdata = new)
##   Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 1       2.044155 -2.036225 6.124534 -4.213414 8.301723

Using forecast()produces the same point forecast of about 2.0, along with 80% and 95% forecast intervals, see section 14.5. We conclude that our AR(1) model forecasts GDP growth to be 2% in 2013:Q1.

How accurate is this forecast? The forecast error is quite large: GDPGR2013:Q11.1% while our forecast is 2%. Second, by calling summary(armod) shows that the model explains only little of the variation in the growth rate of GDP and the SER is about 3.16. Leaving aside forecast uncertainty due to estimation of the model coefficients β0 and β1, the RMSFE must be at least 3.16%, the estimate of the standard deviation of the errors. We conclude that this forecast is pretty inaccurate.

# compute the forecast error
forecast(armod, newdata = new)$mean - GDPGrowth["2013"][1]
##                 x
## 2013 Q1 0.9049532
# R^2
summary(armod)$r.squared
## [1] 0.1149576
# SER
summary(armod)$sigma
## [1] 3.15979

Autoregressive Models of Order p

For forecasting GDP growth, the AR(1) model (14.2) disregards any information in the past of the series that is more distant than one period. An AR(p) model incorporates the information of p lags of the series. The idea is explained in Key Concept 14.3.

Key Concept 14.3

Autoregressions

An AR(p) model assumes that a time series Yt can be modeld by a linear function of the first p of its lagged values. Yt=β0+β1Yt1+β2Yt2++βpYtp+ut is an autoregressive model of order p where E(ut|Yt1,Yt2,,Ytp)=0.

Following the book, we estimate an AR(2) model of the GDP growth series from 1962:Q1 to 2012:Q4.

# estimate the AR(2) model
GDPGR_AR2 <- dynlm(ts(GDPGR_level) ~ L(ts(GDPGR_level)) + L(ts(GDPGR_level), 2))

coeftest(GDPGR_AR2, vcov. = sandwich)
## 
## t test of coefficients:
## 
##                       Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)           1.631747   0.402023  4.0588 7.096e-05 ***
## L(ts(GDPGR_level))    0.277787   0.079250  3.5052 0.0005643 ***
## L(ts(GDPGR_level), 2) 0.179269   0.079951  2.2422 0.0260560 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimation yields ^GDPGRt=1.63(0.40)+0.28(0.08)GDPGRt1+0.18(0.08)GDPGRt1.

We see that the coefficient on the second lag is significantly different from zero. The fit improves slightly: ˉR2 grows from 0.11 for the AR(1) model to about 0.14 and the SER reduces to 3.13.

# R^2
summary(GDPGR_AR2)$r.squared
## [1] 0.1425484
# SER
summary(GDPGR_AR2)$sigma
## [1] 3.132122

We may use the AR(2) model to obtain a forecast for GDP growth in 2013:Q1 in the same manner as for the AR(1) model.

# AR(2) forecast of GDP growth in 2013:Q1 
forecast <- c("2013:Q1" = coef(GDPGR_AR2) %*% c(1, GDPGR_level[N-1], GDPGR_level[N-2]))

This leads to a forecast error of roughly 1%.

# compute AR(2) forecast error 
GDPGrowth["2013"][1] - forecast
##                 x
## 2013 Q1 -1.025358