Introduction to Econometrics with R

This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

14.3 Autoregressions

Autoregressive models are heavily used in economic forecasting. An autoregressive model relates a time series variable to its past values. This section discusses the basic ideas of autoregressions models, shows how they are estimated and discusses an application to forecasting GDP growth using R.

The First-Order Autoregressive Model

It is intuitive that the immediate past of a variable should have power to predict its near future. The simplest autoregressive model uses only the most recent outcome of the time series observed to predict future values. For a time series

Y_{t}

$Y_t$ such a model is called a first-order autoregressive model, often abbreviated AR(1), where the 1 indicates that the order of autoregression is one:

\begin{matrix} Y_{t} = β_{0} + β_{1} Y_{t - 1} + u_{t} \end{matrix}

$\begin{align*} Y_t = \beta_0 + \beta_1 Y_{t-1} + u_t \end{align*}$

is the AR(1) population model of a time series $Y_t$ .

For the GDP growth series, an autoregressive model of order one uses only the information on GDP growth observed in the last quarter to predict a future growth rate. The first-order autoregression model of GDP growth can be estimated by computing OLS estimates in the regression of

G D P G R_{t}

$GDPGR_t$ on

G D P G R_{t - 1}

$GDPGR_{t-1}$ ,

\begin{matrix} {ˆ G D P G R}_{t} = {^β}_{0} + {^β}_{1} G D P G R_{t - 1} . \\ (14.1) \end{matrix}

$\begin{align} \widehat{GDPGR}_t = \hat\beta_0 + \hat\beta_1 GDPGR_{t-1}. \tag{14.1} \end{align}$

Following the book we use data from 1962 to 2012 to estimate (14.1). This is easily done with the function ar.ols() from the package stats.

# subset data
GDPGRSub <- GDPGrowth["1962::2012"]

# estimate the model
ar.ols(GDPGRSub, 
       order.max = 1, 
       demean = F, 
       intercept = T)

## 
## Call:
## ar.ols(x = GDPGRSub, order.max = 1, demean = F, intercept = T)
## 
## Coefficients:
##      1  
## 0.3384  
## 
## Intercept: 1.995 (0.2993) 
## 
## Order selected 1  sigma^2 estimated as  9.886

We can check that the computations done by ar.ols() are the same as done by lm().

# length of data set
N <-length(GDPGRSub)

GDPGR_level <- as.numeric(GDPGRSub[-1])
GDPGR_lags <- as.numeric(GDPGRSub[-N])

# estimate the model
armod <- lm(GDPGR_level ~ GDPGR_lags)
armod

## 
## Call:
## lm(formula = GDPGR_level ~ GDPGR_lags)
## 
## Coefficients:
## (Intercept)   GDPGR_lags  
##      1.9950       0.3384

As usual, we may use coeftest() to obtain a robust summary on the estimated regression coefficients.

# robust summary
coeftest(armod, vcov. = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 1.994986   0.351274  5.6793 4.691e-08 ***
## GDPGR_lags  0.338436   0.076188  4.4421 1.470e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thus the estimated model is

\begin{matrix} {ˆ G D P G R}_{t} = 1.995 (0.351) + 0.338 (0.076) G D P G R_{t - 1} . \\ (14.2) \end{matrix}

$\begin{align} \widehat{GDPGR}_t = \underset{(0.351)}{1.995} + \underset{(0.076)}{0.338} GDPGR_{t-1} \tag{14.2}. \end{align}$

We omit the first observation for $GDPGR_{1962 \ Q1}$ from the vector of the dependent variable since $GDPGR_{1962 \ Q1 - 1} = GDPGR_{1961 \ Q4}$ , is not included in the sample. Similarly, the last observation, $GDPGR_{2012 \ Q4}$ , is excluded from the predictor vector since the data does not include $GDPGR_{2012 \ Q4 + 1} = GDPGR_{2013 \ Q1}$ . Put differently, when estimating the model, one observation is lost because of the time series structure of the data.

Forecasts and Forecast Errors

Suppose

Y_{t}

$Y_t$ follows an AR(1) model with an intercept and that you have an OLS estimate of the model on the basis of observations for

T

$T$ periods. Then you may use the AR(1) model to obtain

{ˆ Y}_{T + 1 | T}

$\widehat{Y}_{T+1\vert T}$ , a forecast for

Y_{T + 1}

$Y_{T+1}$ using data up to period

T

$T$ where

\begin{matrix} {ˆ Y}_{T + 1 | T} = {^β}_{0} + {^β}_{1} Y_{T} . \end{matrix}

$\begin{align*} \widehat{Y}_{T+1\vert T} = \hat{\beta}_0 + \hat{\beta}_1 Y_T. \end{align*}$ The forecast error is

\begin{matrix} Forecast error = Y_{T + 1} - {ˆ Y}_{T + 1 | T} . \end{matrix}

$\begin{align*} \text{Forecast error} = Y_{T+1} - \widehat{Y}_{T+1\vert T}. \end{align*}$

Forecasts and Predicted Values

Forecasted values of $Y_t$ are not what we refer to as OLS predicted values of $Y_t$ . Also, the forecast error is not an OLS residual. Forecasts and forecast errors are obtained using out-of-sample values while predicted values and residuals are computed for in-sample values that were actually observed and used in estimating the model.

The root mean squared forecast error (RMSFE) measures the typical size of the forecast error and is defined as

\begin{matrix} R M S F E = \sqrt{E [{(Y_{T + 1} - {ˆ Y}_{T + 1 | T})}_{T + 1}^{2}]} . \end{matrix}

$\begin{align*} RMSFE = \sqrt{E\left[\left(Y_{T+1} - \widehat{Y}_{T+1\vert T}\right)^2\right]}. \end{align*}$

The $RMSFE$ is composed of the future errors $u_t$ and the error made when estimating the coefficients. When the sample size is large, the former may be much larger than the latter so that $RMSFE \approx \sqrt{Var()u_t}$ which can be estimated by the standard error of the regression.

Application to GDP Growth

Using (14.2), the estimated AR(1) model of GDP growth, we perform the forecast for GDP growth for 2013:Q1 (remember that the model was estimated using data for periods 1962:Q1 - 2012:Q4, so 2013:Q1 is an out-of-sample period). Plugging $GDPGR_{2012:Q4} \approx 0.15$ into (14.2),

\begin{matrix} {ˆ G D P G R}_{2013 : Q 1} = 1.995 + 0.348 \cdot 0.15 = 2.047. \end{matrix}

$\begin{align*} \widehat{GDPGR}_{2013:Q1} = 1.995 + 0.348 \cdot 0.15 = 2.047. \end{align*}$

The function forecast() from the forecast package has some useful features for forecasting time series data.

library(forecast)

# assign GDP growth rate in 2012:Q4
new <- data.frame("GDPGR_lags" = GDPGR_level[N-1])

# forecast GDP growth rate in 2013:Q1
forecast(armod, newdata = new)

##   Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 1       2.044155 -2.036225 6.124534 -4.213414 8.301723

Using forecast()produces the same point forecast of about 2.0, along with $80\%$ and $95\%$ forecast intervals, see section 14.5. We conclude that our AR(1) model forecasts GDP growth to be $2\%$ in 2013:Q1.

How accurate is this forecast? The forecast error is quite large: $GDPGR_{2013:Q1} \approx 1.1\%$ while our forecast is $2\%$ . Second, by calling summary(armod) shows that the model explains only little of the variation in the growth rate of GDP and the $SER$ is about $3.16$ . Leaving aside forecast uncertainty due to estimation of the model coefficients $\beta_0$ and $\beta_1$ , the $RMSFE$ must be at least $3.16\%$ , the estimate of the standard deviation of the errors. We conclude that this forecast is pretty inaccurate.

# compute the forecast error
forecast(armod, newdata = new)$mean - GDPGrowth["2013"][1]

##                 x
## 2013 Q1 0.9049532

# R^2
summary(armod)$r.squared

## [1] 0.1149576

# SER
summary(armod)$sigma

## [1] 3.15979

Autoregressive Models of Order $p$

For forecasting GDP growth, the AR( $1$ ) model (14.2) disregards any information in the past of the series that is more distant than one period. An AR( $p$ ) model incorporates the information of $p$ lags of the series. The idea is explained in Key Concept 14.3.

Key Concept 14.3

Autoregressions

An AR( $p$ ) model assumes that a time series $Y_t$ can be modeld by a linear function of the first $p$ of its lagged values. $\begin{align*} Y_t = \beta_0 + \beta_1 Y_{t-1} + \beta_2 Y_{t-2} + \dots + \beta_p Y_{t-p} + u_t \end{align*}$ is an autoregressive model of order $p$ where $E(u_t\vert Y_{t-1}, Y_{t-2}, \dots,Y_{t-p})=0$ .

Following the book, we estimate an AR( $2$ ) model of the GDP growth series from 1962:Q1 to 2012:Q4.

# estimate the AR(2) model
GDPGR_AR2 <- dynlm(ts(GDPGR_level) ~ L(ts(GDPGR_level)) + L(ts(GDPGR_level), 2))

coeftest(GDPGR_AR2, vcov. = sandwich)

## 
## t test of coefficients:
## 
##                       Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)           1.631747   0.402023  4.0588 7.096e-05 ***
## L(ts(GDPGR_level))    0.277787   0.079250  3.5052 0.0005643 ***
## L(ts(GDPGR_level), 2) 0.179269   0.079951  2.2422 0.0260560 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimation yields

$\begin{align} \widehat{GDPGR}_t = \underset{(0.40)}{1.63} + \underset{(0.08)}{0.28} GDPGR_{t-1} + \underset{(0.08)}{0.18} GDPGR_{t-1}. \tag{14.3} \end{align}$

We see that the coefficient on the second lag is significantly different from zero. The fit improves slightly: $\bar{R}^2$ grows from $0.11$ for the AR( $1$ ) model to about $0.14$ and the $SER$ reduces to $3.13$ .

# R^2
summary(GDPGR_AR2)$r.squared

## [1] 0.1425484

# SER
summary(GDPGR_AR2)$sigma

## [1] 3.132122

We may use the AR( $2$ ) model to obtain a forecast for GDP growth in 2013:Q1 in the same manner as for the AR(1) model.

# AR(2) forecast of GDP growth in 2013:Q1 
forecast <- c("2013:Q1" = coef(GDPGR_AR2) %*% c(1, GDPGR_level[N-1], GDPGR_level[N-2]))

This leads to a forecast error of roughly $-1\%$ .

# compute AR(2) forecast error 
GDPGrowth["2013"][1] - forecast

##                 x
## 2013 Q1 -1.025358