7.3 Time-series analysis in RStudio
- Time-series analysis in RStudio requires installation of additional packages, except for
stats
package, which is included in the base installation
Package | Command | Description |
---|---|---|
tseries |
ts() |
converting data into a time-series object |
tseries |
ts.plot() |
plotting a single or multiple time-series |
stats (base R) |
lag() |
generating lagged values |
stats (base R) |
diff() |
generating differenced values |
stats (base R) |
decompose() |
decomposing the time-series |
stats (base R) |
acf() |
autocorrelation function ACF |
stats (base R) |
Box.test() |
Ljung-Box test for autocorrleation |
stats (base R) |
arima() |
fitting an autoregression and ARIMA(p,d,q) |
forecast |
seasonaldummy() |
generating seasonal dummy variables |
forecast |
tslm() |
fitting a linear model with time-series |
forecast |
forecast() |
generating forecasts based on a fitted model |
forecast |
accuracy() |
evaluating forecast accuracy |
dynlm |
dynlm() |
fitting dynamic linear model |
urca |
ur.df() |
Augmented Dickey-Fuller test |
urca |
ca.jo() |
Johansen cointegration test |
vars |
VARselect() |
optimal lag selection for VAR(p) model |
vars |
VAR() |
fitting a VAR(p) model |
vars |
causality() |
Granger causality test |
vars |
irf() |
impulse response function IRF |
Exercise 33. Import the text file
economic_indicators.txt
into RStudio using the URL link address. Name the data frame object indicators
. The imported data presents quarterly observations from \(2000Q1\) to \(2014Q3\) (\(T=59\)) with respect to selected economic indicators of China: exchange
(CNY/USD), fdi
(millions of CNY), gdp
(current prices, billions of CNY), growth
(in \(\%\)), production
(volume index) and m2
(money supply in billions of CNY). Before analysis, the quarterly observations should be formatted as a regular time-series with a frequency of \(4\). Plot a time-series of GDP in current prices! Decompose the time-series of GDP using a multiplicative approach (object named components
). Plot all the components. Extract the seasonal factors as seas.factors
and interpret them. Compute the seasonally adjusted GDP as seas.adjusted
and compare it with actual GDP on the same plot.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. In this example, GDP of China exhibits upward trending (almost exponential) with increased seasonality over time (the peaks in the fourth quarters and the troughs in the first quarters are at a higher level each year). Seasonal factors for the same quarters are the same numbers (either smaller, larger, or equal to one). For example, first quarter seasonal factor is the lowest and less than \(1\) indicating that GDP in the first quarter is always lower for \(11.60\%\) due to seasonal influences. Opposite to that, GDP of China in the last quarter is always higher for \(20.97\%\) due to seasonal influences. Seasonally adjusted GDP is free from seasonal variations caused by quarterly patterns. If seasonally adjusted data are not available from the public sources as pre-calculated, these patterns need to be removed from the time-series. Otherwise, incorrect results in further econometric analysis are inevitable!# Loading the data from the text file using URL link within read.table() command
# Argument header=TRUE specifies that the first row of the file contains the column names
=read.table(file="https://www.efzg.hr/userdocsimages/sta/jarneric/economic_indicators.txt",header=TRUE)
indicators
# Dating quarterly observations with frequency 4 as a regular time-series
library(tseries)
=ts(indicators,frequency=4,start=c(2000,1))
indicators
# Plotting GDP in China by ts.plot() command
ts.plot(indicators[,"gdp"],ylab="GDP in China",col="red")
# Decomposing a time-series of GDP
=decompose(indicators[,"gdp"],type="multiplicative")
components
# Plotting all the components
plot(components)
# Extracting seasonal factors and printing them in the console window
=components$seasonal
seas.factorsprint(seas.factors)
0.8839734-1)*100
(# Computing seasonally adjusted GDP and comparing it with actual GDP on the same plot
=indicators[,"gdp"]/seas.factors
seas.adjustedts.plot(indicators[,"gdp"],seas.adjusted,gpars=list(col=c("red","blue")),main="Seasonaly adjusted and actual values of GDP")
\(~~~\)
Exercise 34. Generate some new variables and add them as new columns to the existing data frame
indicators
. First, generate a variable time
starting from zero (\(t=0,1,2,\dots,58\)). Second, generate the squares of the variable time as time2
( \(t^2=0,1,4,\dots,3364\)). Lastly, generate seasonal dummy variables for the first, second and third quarter as the object seas.dummies
using the command seasonaldummy()
from the package forecast
(the dummy variable for the last quarter is omitted by default). Combine the newly generated variables with the existing object indicators
and rename the columns for easier usage later. Check the first few rows of extended object indicators
. Using OLS, estimate a model (name it gdp.model
) with a constant term (intercept), quadratic trend, and seasonal dummy variables to describe the behavior of GDP in China (current prices). Present the results in a table using modelsummary()
command and explain the meaning of the coefficients with respect to the dummy variables! Are the dummy variables statistically significant? Does gdp.model
fit the data well? Compare the actual values with the fitted values in-the-sample on the same plot. Estimate the same model using tslm()
command from the forecast
package (name it gdp.model2
). Compare the results of both models and comment their differences. Make a forecast of GDP out-of-sample up to \(2017Q4\).
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. In this example, two approaches are used to estimate a model with an intercept, quadratic trend, and seasonal dummy variables for the purpose of forecasting GDP out-of-sample. The first approach requires manually generating new variables (time
, time2
, seas1
,seas2
and seas3
), and then estimating the model using the standard lm()
command. The second approach uses the tslm()
command, which already includes arguments for incorporating the trend and seasonal components. The second approach is easier and faster, but it differs from the first because the values of trend
(time) start at \(1\) instead of \(0\), and the last quarter is omitted by default, not the first quarter. In this context, caution is needed when interpreting the estimated parameters, although both approaches (gdp.model
and gdp.model2
) yield the same forecasts. Keep in mind that forecasting horizon h
is the number of periods from the last observation.
# Installing the package "forecsat" (for the first time) and loading it from the library
install.packages("forecast")
library(forecast)
=ts(0:58,frequency=4,start=c(2000,1)) # gerating variable time
time=time^2 # generating quadratic term of variable time
time2=seasonaldummy(ts(1:59,frequency=4,start=c(2000,1))) # generating seasonal dummy variavles
seas.dummies
# Combining new data with existing data frame and renaming the columns
=cbind(indicators,time,time2,seas.dummies)
indicatorscolnames(indicators)=c("growth","production","gdp","fdi","exchange","m2","time","time2","seas1","seas2","seas3")
# Checking the first few rows
head(indicators)
# Estimating "gdp.model" using lm() command
=lm(gdp~time+time2+seas1+seas2+seas3,indicators)
gdp.model
# Presenting results of estimated model in a table
library(modelsummary)
modelsummary(gdp.model, stars=TRUE, fmt=4)
# Comparing the actual values with fitted values and adding a legend to the plot
ts.plot(cbind(indicators[,"gdp"], fitted(gdp.model)),gpars = list(col=c("red", "blue")),ylab="Billions of CNY")
legend(2002, 15000, legend = c("Actual GDP", "Fitted GDP"), col = c("red", "blue"), lty = 1)
# Estimating the same model using tslm() command from "forecast" package
=tslm(gdp~trend+I(trend^2)+season,data=indicators)
gdp.model2
# Presenting results of both models in a single table
modelsummary(list(gdp.model,gdp.model2), stars=TRUE, fmt=4)
# Forecasting GDP out-of-sample (up to the last quarter 2017)
# by embedding forecast() command inside the plot() command
plot(forecast(gdp.model2,h=13)) # parameter "h" is the forecasting horizon
Exercise 35. Separate the GDP
growth
rate from the data frame indicators
as a single time-series object. Split the growth
series into two sets: training
and testing
, using window()
command. From a total of \(59\) observations, consider that first \(50\) in the training
set are used for estimating a model in-the-sample, while the remaining \(9\) observations in the testing
set are used for evaluating forecasting accuracy out-of-sample. Using in-the-sample data estimate three autoregressions \(AR(1)\), \(AR(2)\) and \(AR(3)\), respectively. Summarize the results of these models in a single table by modelsummary()
command. Which model fits the data better in-the-sample? Using accuracy()
command provide forecasting accuracy measures out-of-sample. Which model has better predictive ability? According to the “best” model, display the forecasted values (along with confidence intervals) in the console, and compare forecasted values out-of-sample with actual values on the same plot.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Best practices for evaluating forecasting accuracy include a combination of in-the-sample and out-of-sample measures, such as MAE (Mean Absoulte Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MPE (Mean Percentage Error) or MAPE (Mean Absolute Percentage Error). Typically, forecasting models suffer from the overfitting problem, resulting in almost perfect fit on the training data, but poor forecasts on the testing data. Despite this, model \(AR(1)\) emerged as the most suitable for forecasting GDP growth due to the lowest errors.# Separating "growth" variable from a data frame as a single time-series
=ts(indicators[,"growth"],frequency=4,start=c(2000,1))
growth
# Splitting a time-series into training and testing observations
=window(growth,end=c(2012,2)) # includes first 50 observations
training=window(growth,start=c(2012,3)) # includes remaining 9 observations
testing
# Estimating first, second and third order autoregression in-the-sample
=arima(training,order=c(1,0,0))
ar1=arima(training,order=c(2,0,0))
ar2=arima(training,order=c(3,0,0))
ar3
# Presenting results of all autoregressions in a single table
modelsummary(list("AR(1)"=ar1,"AR(2)"=ar2,"AR(3)"=ar3),stars=TRUE, fmt=4)
# Providing forecasting accuracy measures out-of-sample for each autoregression
accuracy(forecast(ar1,length(testing)),testing)
accuracy(forecast(ar2,length(testing)),testing)
accuracy(forecast(ar3,length(testing)),testing)
# Displaying forecasted values out-of-sample along with confidence intervals
forecast(ar1,h=length(testing))
# Comparing forecasted values with actual values
plot(forecast(ar1,h=length(testing)))
lines(growth,col="red") # adding actual values to the previous plot
\(~~~\)
Exercise 36. Using the same tim-series data from the object
1) Regress unadjusted
2) Regress seasonally adjusted gdp
3) Take the logs of seasonally adjusted gdp
What can you conclude when comparing results of estimated models?
indicators
estimate different models:1) Regress unadjusted
gdp
on money supply m2
(name it model.a
)2) Regress seasonally adjusted gdp
seas.adjusted
on money supply m2
(name it model.b
)3) Take the logs of seasonally adjusted gdp
seas.adjusted
and money supply m2
and regress them on each other including a variable time
as additional RHS variable (name it model.c
).What can you conclude when comparing results of estimated models?
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. In this example, external RHS variablem2
is considered in explaining and forecasting the dependent variable (gdp
). To check if m2
truly affects gdp
, despite both variables sharing a common trend, time
was included as an additional variable on the RHS. By regressing the logs of a time-series on variable time
an exponential trend is taken into account.
=indicators[,"gdp"] # separating "gdp" variable from a data frame
gdp=indicators[,"m2"] # separating "m2" variable from a data frame
m2=lm(gdp~m2) # estimating first model
model.a=lm(seas.adjusted~m2) # estimating second model
model.b=lm(log(seas.adjusted)~log(m2)+time) # estimating third model
model.cmodelsummary(list(model.a, model.b, model.c), stars=TRUE, fmt=4) # comparing three models