7.3 Time-series analysis in RStudio

Time-series analysis in RStudio requires installation of additional packages, except for stats package, which is included in the base installation

TABLE 7.1: RStudio packages and commands for univariate and multivariate time-series analysis
Package	Command	Description
`tseries`	`ts()`	converting data into a time-series object
`tseries`	`ts.plot()`	plotting a single or multiple time-series
`stats` (base R)	`lag()`	generating lagged values
`stats` (base R)	`diff()`	generating differenced values
`stats` (base R)	`decompose()`	decomposing the time-series
`stats` (base R)	`acf()`	autocorrelation function ACF
`stats` (base R)	`Box.test()`	Ljung-Box test for autocorrleation
`stats` (base R)	`arima()`	fitting an autoregression and ARIMA(p,d,q)
`forecast`	`seasonaldummy()`	generating seasonal dummy variables
`forecast`	`tslm()`	fitting a linear model with time-series
`forecast`	`forecast()`	generating forecasts based on a fitted model
`forecast`	`accuracy()`	evaluating forecast accuracy
`dynlm`	`dynlm()`	fitting dynamic linear model
`urca`	`ur.df()`	Augmented Dickey-Fuller test
`urca`	`ca.jo()`	Johansen cointegration test
`vars`	`VARselect()`	optimal lag selection for VAR(p) model
`vars`	`VAR()`	fitting a VAR(p) model
`vars`	`causality()`	Granger causality test
`vars`	`irf()`	impulse response function IRF

Exercise 33. Import the text file economic_indicators.txt into RStudio using the URL link address. Name the data frame object indicators. The imported data presents quarterly observations from

$2000Q1$ to

$2014Q3$ (

$T=59$ ) with respect to selected economic indicators of China: exchange (CNY/USD), fdi (millions of CNY), gdp (current prices, billions of CNY), growth (in

$\%$ ), production (volume index) and m2 (money supply in billions of CNY). Before analysis, the quarterly observations should be formatted as a regular time-series with a frequency of

$4$ . Plot a time-series of GDP in current prices! Decompose the time-series of GDP using a multiplicative approach (object named components). Plot all the components. Extract the seasonal factors as seas.factors and interpret them. Compute the seasonally adjusted GDP as seas.adjusted and compare it with actual GDP on the same plot.

Solution

Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. In this example, GDP of China exhibits upward trending (almost exponential) with increased seasonality over time (the peaks in the fourth quarters and the troughs in the first quarters are at a higher level each year). Seasonal factors for the same quarters are the same numbers (either smaller, larger, or equal to one). For example, first quarter seasonal factor is the lowest and less than

$1$ indicating that GDP in the first quarter is always lower for

$11.60\%$ due to seasonal influences. Opposite to that, GDP of China in the last quarter is always higher for

$20.97\%$ due to seasonal influences. Seasonally adjusted GDP is free from seasonal variations caused by quarterly patterns. If seasonally adjusted data are not available from the public sources as pre-calculated, these patterns need to be removed from the time-series. Otherwise, incorrect results in further econometric analysis are inevitable!

# Loading the data from the text file using URL link within read.table() command
# Argument header=TRUE specifies that the first row of the file contains the column names
indicators=read.table(file="https://www.efzg.hr/userdocsimages/sta/jarneric/economic_indicators.txt",header=TRUE)

# Dating quarterly observations with frequency 4 as a regular time-series
library(tseries)
indicators=ts(indicators,frequency=4,start=c(2000,1))

# Plotting GDP in China by ts.plot() command
ts.plot(indicators[,"gdp"],ylab="GDP in China",col="red")

# Decomposing a time-series of GDP
components=decompose(indicators[,"gdp"],type="multiplicative")

# Plotting all the components
plot(components)

# Extracting seasonal factors and printing them in the console window
seas.factors=components$seasonal 
print(seas.factors)
(0.8839734-1)*100
# Computing seasonally adjusted GDP and comparing it with actual GDP on the same plot
seas.adjusted=indicators[,"gdp"]/seas.factors 
ts.plot(indicators[,"gdp"],seas.adjusted,gpars=list(col=c("red","blue")),main="Seasonaly adjusted and actual values of GDP")

$~~~$

Exercise 34. Generate some new variables and add them as new columns to the existing data frame indicators. First, generate a variable time starting from zero (

$t=0,1,2,\dots,58$ ). Second, generate the squares of the variable time as time2 (

$t^2=0,1,4,\dots,3364$ ). Lastly, generate seasonal dummy variables for the first, second and third quarter as the object seas.dummies using the command seasonaldummy() from the package forecast (the dummy variable for the last quarter is omitted by default). Combine the newly generated variables with the existing object indicators and rename the columns for easier usage later. Check the first few rows of extended object indicators. Using OLS, estimate a model (name it gdp.model) with a constant term (intercept), quadratic trend, and seasonal dummy variables to describe the behavior of GDP in China (current prices). Present the results in a table using modelsummary() command and explain the meaning of the coefficients with respect to the dummy variables! Are the dummy variables statistically significant? Does gdp.model fit the data well? Compare the actual values with the fitted values in-the-sample on the same plot. Estimate the same model using tslm() command from the forecast package (name it gdp.model2). Compare the results of both models and comment their differences. Make a forecast of GDP out-of-sample up to

$2017Q4$ .

Solution

Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. In this example, two approaches are used to estimate a model with an intercept, quadratic trend, and seasonal dummy variables for the purpose of forecasting GDP out-of-sample. The first approach requires manually generating new variables (time, time2, seas1,seas2 and seas3), and then estimating the model using the standard lm() command. The second approach uses the tslm() command, which already includes arguments for incorporating the trend and seasonal components. The second approach is easier and faster, but it differs from the first approach because the values of trend (time) start at

$1$ instead of

$0$ , and the last quarter is omitted by default, not the first quarter. In this context, caution is needed when interpreting the estimated parameters (when the first quarter is omitted, all coefficients of the seasonal dummy variables are positive, and conversely, when the last quarter is omitted, all coefficients of the seasonal dummy variables are negative), although both approaches (gdp.modeland gdp.model2) yield the same goodness-of-fit and the same forecasts. Keep in mind that forecasting horizon h is the number of periods from the last observation.

# Installing the package "forecsat" (for the first time) and loading it from the library
install.packages("forecast")
library(forecast)

time=ts(0:58,frequency=4,start=c(2000,1)) # gerating variable time
time2=time^2 # generating quadratic term of variable time
seas.dummies=seasonaldummy(ts(1:59,frequency=4,start=c(2000,1))) # generating seasonal dummy variavles

# Combining new data with existing data frame and renaming the columns
indicators=cbind(indicators,time,time2,seas.dummies)
colnames(indicators)=c("growth","production","gdp","fdi","exchange","m2","time","time2","seas1","seas2","seas3")

# Checking the first few rows
head(indicators)

# Estimating "gdp.model" using lm() command
gdp.model=lm(gdp~time+time2+seas1+seas2+seas3,indicators)

# Presenting results of estimated model in a table
library(modelsummary)
modelsummary(gdp.model, stars=TRUE, fmt=4)

# Comparing the actual values with fitted values and adding a legend to the plot
ts.plot(cbind(indicators[,"gdp"], fitted(gdp.model)),gpars = list(col=c("red", "blue")),ylab="Billions of CNY")
legend(2002, 15000, legend = c("Actual GDP", "Fitted GDP"), col = c("red", "blue"), lty = 1)

# Estimating the same model using tslm() command from "forecast" package
gdp.model2=tslm(gdp~trend+I(trend^2)+season,data=indicators)

# Presenting results of both models in a single table
modelsummary(list(gdp.model,gdp.model2), stars=TRUE, fmt=4)

# Forecasting GDP out-of-sample (up to the last quarter 2017)
# by embedding forecast() command inside the plot() command
plot(forecast(gdp.model2,h=13)) # parameter "h" is the forecasting horizon

$~~~$

Exercise 35. Separate the GDP growth rate from the data frame indicators as a single time-series object. Split the growth series into two sets: training and testing, using window() command. From a total of

$59$ observations, consider that first

$50$ in the training set are used for estimating a model in-the-sample, while the remaining

$9$ observations in the testing set are used for evaluating forecasting accuracy out-of-sample. Using in-the-sample data estimate three autoregressions

$AR(1)$ ,

$AR(2)$ and

$AR(3)$ , respectively. Summarize the results of these models in a single table by modelsummary() command. Which model fits the data better in-the-sample? Using accuracy() command provide forecasting accuracy measures out-of-sample. Which model has better predictive ability? According to the “best” model, display the forecasted values (along with confidence intervals) in the console, and compare forecasted values out-of-sample with actual values on the same plot.

Solution

Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Best practices for evaluating forecasting accuracy include a combination of in-the-sample and out-of-sample measures, such as MAE (Mean Absoulte Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MPE (Mean Percentage Error) or MAPE (Mean Absolute Percentage Error). Typically, forecasting models suffer from the overfitting problem, resulting in almost perfect fit on the training data, but poor forecasts on the testing data. Despite this, model

$AR(1)$ emerged as the most suitable for forecasting GDP growth due to the lowest errors.

# Separating "growth" variable from a data frame as a single time-series
growth=ts(indicators[,"growth"],frequency=4,start=c(2000,1)) 

# Splitting a time-series into training and testing observations
training=window(growth,end=c(2012,2)) # includes first 50 observations
testing=window(growth,start=c(2012,3)) # includes remaining 9 observations

# Estimating first, second and third order autoregression in-the-sample
ar1=arima(training,order=c(1,0,0))  
ar2=arima(training,order=c(2,0,0))  
ar3=arima(training,order=c(3,0,0))

# Presenting results of all autoregressions in a single table
modelsummary(list("AR(1)"=ar1,"AR(2)"=ar2,"AR(3)"=ar3),stars=TRUE, fmt=4)

# Providing forecasting accuracy measures out-of-sample for each autoregression
accuracy(forecast(ar1,length(testing)),testing) 
accuracy(forecast(ar2,length(testing)),testing)
accuracy(forecast(ar3,length(testing)),testing)

# Displaying forecasted values out-of-sample along with confidence intervals
forecast(ar1,h=length(testing))

# Comparing forecasted values with actual values
plot(forecast(ar1,h=length(testing)))
lines(growth,col="red") # adding actual values to the previous plot

$~~~$

Exercise 36. Using the same time-series data from the object indicators estimate different models:
1) Regress unadjusted gdp on money supply m2 (name it model.a)
2) Regress seasonally adjusted gdp seas.adjusted on money supply m2 (name it model.b)
3) Take the logs of seasonally adjusted gdp seas.adjusted and money supply m2 and regress them on each other including a variable time as additional RHS variable (name it model.c).
What can you conclude when comparing results of estimated models?

Solution

Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. In this example, external RHS variable m2 is considered in explaining and forecasting the dependent variable (gdp). To check if m2 truly affects gdp, despite both variables sharing a common trend, time was included as an additional variable on the RHS. By regressing the logs of a time-series on variable time an exponential trend is taken into account.

gdp=indicators[,"gdp"] # separating "gdp" variable from a data frame
m2=indicators[,"m2"] # separating "m2" variable from a data frame
model.a=lm(gdp~m2) # estimating first model
model.b=lm(seas.adjusted~m2) # estimating second model
model.c=lm(log(seas.adjusted)~log(m2)+time) # estimating third model
modelsummary(list(model.a, model.b, model.c), stars=TRUE, fmt=4) # comparing three models