11.3 Time Series Data
Time series data consists of observations on the same variable(s) recorded over multiple time periods for a single entity (or aggregated entity). These data points are typically collected at consistent intervals—hourly, daily, monthly, quarterly, or annually—allowing for the analysis of trends, patterns, and forecasting.
Examples
- Stock Market: Daily closing prices of a company’s stock over five years.
- Economics: Monthly unemployment rates in a country over a decade.
- Macroeconomics: Annual GDP of a country from 1960 to 2020.
Key Characteristics
- The primary goal is to analyze trends, seasonality, cyclic patterns, and forecast future values.
- Time series data requires specialized statistical methods, such as:
- Autoregressive Integrated Moving Average (ARIMA)
- Seasonal ARIMA (SARIMA)
- Exponential Smoothing
- Vector Autoregression (VAR)
Advantages
- Captures temporal patterns such as trends, seasonal fluctuations, and economic cycles.
- Essential for forecasting and policy-making, such as setting interest rates based on economic indicators.
Challenges
- Autocorrelation: Observations close in time are often correlated.
- Structural Breaks: Sudden changes due to policy shifts or economic crises can distort analysis.
- Seasonality: Must be accounted for to avoid misleading conclusions.
A time series typically consists of four key components:
- Trend: Long-term directional movement in the data over time.
- Seasonality: Regular, periodic fluctuations (e.g., increased retail sales in December).
- Cyclical Patterns: Long-term economic cycles that are irregular but recurrent.
- Irregular (Random) Component: Unpredictable variations not explained by trend, seasonality, or cycles.
A general linear time series model can be expressed as:
yt=β0+xt1β1+xt2β2+⋯+xt(k−1)βk−1+ϵt
Some Common Model Types
- Static Model
A simple time series regression:
yt=β0+x1β1+x2β2+x3β3+ϵt
- Finite Distributed Lag Model
Captures the effect of past values of an explanatory variable:
yt=β0+petδ0+pet−1δ1+pet−2δ2+ϵt
Long-Run Propensity: Measures the cumulative effect of explanatory variables over time:
LRP=δ0+δ1+δ2
- Dynamic Model
A model incorporating lagged dependent variables:
GDPt=β0+β1GDPt−1+ϵt
11.3.1 Statistical Properties of Time Series Models
For time series regression, standard OLS assumptions must be carefully examined. The following conditions affect estimation:
Finite Sample Properties
- A1-A3: OLS remains unbiased.
- A1-A4: Standard errors are consistent, and the Gauss-Markov Theorem holds (OLS is BLUE).
- A1-A6: Finite sample Wald tests (e.g., t-tests and F-tests) remain valid.
However, in time series settings, A3 often fails due to:
- Spurious Time Trends (fixable by including a time trend)
- Strict vs. Contemporaneous Exogeneity (sometimes unavoidable)
11.3.2 Common Time Series Processes
Several key models describe different time series behaviors:
Autoregressive Model (AR(p)): A process where current values depend on past values.
Moving Average Model (MA(q)): A process where past error terms influence current values.
Autoregressive Moving Average (ARMA(p, q)): A combination of AR and MA processes.
Autoregressive Conditional Heteroskedasticity (ARCH(p)): Models time-varying volatility.
Generalized ARCH (GARCH(p, q)): Extends ARCH by including past conditional variances.
11.3.3 Deterministic Time Trends
When both the dependent and independent variables exhibit trending behavior, a regression may produce spurious results.
Spurious Regression Example
A simple regression with trending variables:
yt=α0+tα1+vt
xt=λ0+tλ1+ut
where
α1≠0 and λ1≠1
vt and ut are independent.
Despite no true relationship between xt and yt, estimating:
yt=β0+xtβ1+ϵt
results in:
- Inconsistency: plim(ˆβ1)=α1λ1
- Invalid Inference: |t|→d∞ for H0:β1=0, leading to rejection of the null hypothesis as n→∞.
- Misleading R2: plim(R2)=1, falsely implying perfect predictive power.
We can also rewrite the equation as:
yt=β0+β1xt+ϵtϵt=α1t+vt
where β0=α0 and β1=0. Since xt is a deterministic function of time, ϵt is correlated with xt, leading to the usual omitted variable bias.
Solutions to Spurious Trends
- Include a Time Trend (t) as a Control Variable
- Provides consistent parameter estimates and valid inference.
- Detrend Variables
- Regress both yt and xt on time, then use residuals in a second regression.
- Equivalent to applying the Frisch-Waugh-Lovell Theorem.
11.3.4 Violations of Exogeneity in Time Series Models
The exogeneity assumption (A3) plays a crucial role in ensuring unbiased and consistent estimation in time series models. However, in many cases, the assumption is violated due to the inherent nature of time-dependent processes.
In a standard regression framework, we assume:
E(ϵt|x1,x2,...,xT)=0
which requires that the error term is uncorrelated with all past, present, and future values of the independent variables.
Common Violations of Exogeneity
- The error term ϵt influences future values of the independent variables.
- A classic example occurs in economic models where past shocks affect future decisions.
- The dependent variable includes a lagged version of itself as an explanatory variable, introducing correlation between ϵt and past yt−1.
- In finite distributed lag (FDL) models, failing to include the correct number of lags leads to omitted variable bias and correlation between regressors and errors.
11.3.4.1 Feedback Effect
In a simple regression model:
yt=β0+xtβ1+ϵt
the standard exogeneity assumption (A3) requires:
E(ϵt|x1,x2,...,xt,xt+1,...,xT)=0
However, in the presence of feedback, past errors affect future values of xt, leading to:
E(ϵt|xt+1,...,xT)≠0
- This occurs when current shocks (e.g., economic downturns) influence future decisions (e.g., government spending, firm investments).
- Strict exogeneity is violated, as we now have dependence across time.
Implication:
- Standard OLS estimators become biased and inconsistent.
- One common solution is using Instrumental Variables to isolate exogenous variation in xt.
11.3.4.2 Dynamic Specification
A dynamically specified model includes lagged dependent variables:
yt=β0+yt−1β1+ϵt
Exogeneity (A3) would require:
E(ϵt|y1,y2,...,yt,yt+1,...,yT)=0
However, since yt−1 depends on ϵt−1 from the previous period, we obtain:
Cov(yt−1,ϵt)≠0
Implication:
- Strict exogeneity (A3) fails, as yt−1 and ϵt are correlated.
- OLS estimates are biased and inconsistent.
- Standard autoregressive models (AR) require alternative estimation techniques like Generalized Method of Moments or Maximum Likelihood Estimation.
11.3.4.3 Dynamic Completeness and Omitted Lags
A finite distributed lag (FDL) model:
yt=β0+xtδ0+xt−1δ1+ϵt
assumes that the included lags fully capture the relationship between yt and past values of xt. However, if we omit relevant lags, the exogeneity assumption (A3):
E(ϵt|x1,x2,...,xt,xt+1,...,xT)=0
fails, as unmodeled lag effects create correlation between xt−2 and ϵt.
Implication:
- The regression suffers from omitted variable bias, making OLS estimates unreliable.
- Solution:
- Include additional lags of xt.
- Use lag selection criteria (e.g., AIC, BIC) to determine the appropriate lag structure.
11.3.5 Consequences of Exogeneity Violations
If strict exogeneity (A3) fails, standard OLS assumptions no longer hold:
- OLS is biased.
- Gauss-Markov Theorem no longer applies.
- Finite Sample Properties (such as unbiasedness) are invalid.
To address these issues, we can:
- Rely on Large Sample Properties: Under certain conditions, consistency may still hold.
- Use Weaker Forms of Exogeneity: Shift from strict exogeneity (A3) to contemporaneous exogeneity (A3a).
If strict exogeneity does not hold, we can instead assume A3a (Contemporaneous Exogeneity):
E(x′tϵt)=0
This weaker assumption only requires that xt is uncorrelated with the error in the same time period.
Key Differences from Strict Exogeneity
Exogeneity Type | Requirement | Allows Dynamic Models? |
---|---|---|
Strict Exogeneity | E(ϵt|x1,x2,...,xT)=0 | No |
Contemporaneous Exogeneity | E(x′tϵt)=0 | Yes |
With contemporaneous exogeneity, ϵt can be correlated with past and future values of xt.
This allows for dynamic specifications such as:
yt=β0+yt−1β1+ϵt
while still maintaining consistency under certain assumptions.
Deriving Large Sample Properties for Time Series
To establish consistency and asymptotic normality, we rely on the following assumptions:
However, the standard Weak Law of Large Numbers and Central Limit Theorem in OLS depend on A5 (Random Sampling), which does not hold in time series settings.
Since time series data exhibits dependence over time, we replace A5 (Random Sampling) with a weaker assumption:
- A5a: Weak Dependence (Stationarity)
Asymptotic Variance and Serial Correlation
The derivation of asymptotic variance depends on A4 (Homoskedasticity).
However, in time series settings, we often encounter serial correlation:
Cov(ϵt,ϵs)≠0for|t−s|>0
To ensure valid inference, standard errors must be corrected using methods such as Newey-West HAC estimators.
11.3.6 Highly Persistent Data
In time series analysis, a key assumption for OLS consistency is that the data-generating process exhibits A5a weak dependence (i.e., observations are not too strongly correlated over time). However, when yt and xt are highly persistent, standard OLS assumptions break down.
If a time series is not weakly dependent, it means:
- yt and yt−h remain strongly correlated even for large lags (h→∞).
- A5a (Weak Dependence Assumption) fails, leading to:
- OLS inconsistency.
- No valid limiting distribution (asymptotic normality does not hold).
Example: A classic example of a highly persistent process is a random walk:
yt=yt−1+ut
or with drift:
yt=α+yt−1+ut
where ut is a white noise error term.
- yt does not revert to a mean—it has an infinite variance as t→∞.
- Shocks accumulate, making standard regression analysis unreliable.
11.3.6.1 Solution: First Differencing
A common way to transform non-stationary series into stationary ones is through first differencing:
Δyt=yt−yt−1=ut
- If ut is a weakly dependent process (i.e., I(0), stationary), then yt is said to be difference-stationary or integrated of order 1, I(1).
- If both yt and xt follow a random walk (I(1)), we estimate:
Δyt=(Δxtβ)+(ϵt−ϵt−1)Δyt=Δxtβ+Δut
This ensures OLS estimation remains valid.
11.3.7 Unit Root Testing
To formally determine whether a time series contains a unit root (i.e., is non-stationary), we test:
yt=α+ρyt−1+ut
Hypothesis Testing
- H0:ρ=1 (unit root, non-stationary)
- OLS is not consistent or asymptotically normal.
- Ha:ρ<1 (stationary process)
- OLS is consistent and asymptotically normal.
Key Issues
- The usual t-test is not valid because OLS under H0 does not have a standard distribution.
- Instead, specialized tests such as Dickey-Fuller and Augmented Dickey-Fuller tests are required.
11.3.7.1 Dickey-Fuller Test for Unit Roots
The Dickey-Fuller test transforms the original equation by subtracting yt−1 from both sides:
Δyt=α+θyt−1+vt
where:
θ=ρ−1
- Null Hypothesis (H0:θ=0) → Implies ρ=1 (unit root, non-stationary).
- Alternative (Ha:θ<0) → Implies ρ<1 (stationary).
Since yt follows a non-standard asymptotic distribution under H0, Dickey and Fuller derived specialized critical values.
Decision Rule
- If the test statistic is more negative than the critical value, reject H0 → yt is stationary.
- Otherwise, fail to reject H0 → yt has a unit root (non-stationary).
The standard DF test may fail due to two key limitations:
- Simplistic Dynamic Relationship
- The DF test assumes only one lag in the autoregressive structure.
- However, in reality, higher-order lags of Δyt may be needed.
Solution:
Use the Augmented Dickey-Fuller test, which includes extra lags:
Δyt=α+θyt−1+γ1Δyt−1+⋯+γpΔyt−p+vt
- Under H0, Δyt follows an AR(1) process.
- Under Ha, yt follows an AR(2) or higher process.
Including lags of Δyt ensures a better-specified model.
- Ignoring Deterministic Time Trends
If a series exhibits a deterministic trend, failing to include it biases the unit root test.
Example: If yt grows over time, a test without a trend component will falsely detect a unit root.
Solution: Include a deterministic time trend (t) in the regression:
Δyt=α+θyt−1+δt+vt
- Allows for quadratic relationships with time.
- Changes the critical values, requiring an adjusted statistical test.
11.3.7.2 Augmented Dickey-Fuller Test
The ADF test generalizes the DF test by allowing for:
- Lags of Δyt (to correct for serial correlation).
- Time trends (to handle deterministic trends).
Regression Equation
Δyt=α+θyt−1+δt+γ1Δyt−1+⋯+γpΔyt−p+vt
where θ=1−ρ.
Hypotheses
- H0:θ=0 (Unit root: non-stationary)
- Ha:θ<0 (Stationary)
11.3.8 Newey-West Standard Errors
Newey-West standard errors, also known as Heteroskedasticity and Autocorrelation Consistent (HAC) estimators, provide valid inference when errors exhibit both heteroskedasticity (i.e., A4 Homoskedasticity assumption is violated) and serial correlation. These standard errors adjust for dependence in the error structure, ensuring that hypothesis tests remain valid.
Key Features
- Accounts for autocorrelation: Handles time dependence in error terms.
- Accounts for heteroskedasticity: Allows for non-constant variance across observations.
- Ensures positive semi-definiteness: Downweights longer-lagged covariances to maintain mathematical validity.
The estimator is computed as:
ˆB=T−1T∑t=1e2tx′txt+g∑h=1(1−hg+1)T−1T∑t=h+1etet−h(x′txt−h+x′t−hxt)
where:
- T is the sample size,
- g is the chosen lag truncation parameter (bandwidth),
- et are the residuals from the OLS regression,
- xt are the explanatory variables.
Choosing the Lag Length (g)
Selecting an appropriate lag truncation parameter (g) is crucial for balancing efficiency and bias. Common guidelines include:
- Yearly data: g=1 or 2 usually suffices.
- Quarterly data: g=4 or 8 accounts for seasonal dependencies.
- Monthly data: g=12 or 14 captures typical cyclical effects.
Alternatively, data-driven methods can be used:
Newey-West Rule: g=⌊4(T/100)2/9⌋
Alternative Heuristic: g=⌊T1/4⌋
# Load necessary libraries
library(sandwich)
library(lmtest)
# Simulate data
set.seed(42)
T <- 100 # Sample size
time <- 1:T
x <- rnorm(T)
epsilon <- arima.sim(n = T, list(ar = 0.5)) # Autocorrelated errors
y <- 2 + 3 * x + epsilon # True model
# Estimate OLS model
model <- lm(y ~ x)
# Compute Newey-West standard errors
lag_length <- floor(4 * (T / 100) ^ (2 / 9)) # Newey-West rule
nw_se <- NeweyWest(model, lag = lag_length, prewhite = FALSE)
# Display robust standard errors
coeftest(model, vcov = nw_se)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.71372 0.13189 12.993 < 2.2e-16 ***
#> x 3.15831 0.13402 23.567 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
11.3.8.1 Testing for Serial Correlation
Serial correlation (also known as autocorrelation) occurs when error terms are correlated across time:
E(ϵtϵt−h)≠0for some h≠0
Steps for Detecting Serial Correlation
- Estimate an OLS regression:
- Run the regression of yt on xt and obtain residuals et.
- Test for autocorrelation in residuals:
Regress et on xt and its lagged residual et−1:
et=γ0+x′tγ+ρet−1+vt
Test whether ρ is significantly different from zero.
- Decision Rule:
- If ρ is statistically significant at the 5% level, reject the null hypothesis of no serial correlation.
Higher-Order Serial Correlation
To test for higher-order autocorrelation, extend the previous regression:
et=γ0+x′tγ+ρ1et−1+ρ2et−2+⋯+ρpet−p+vt
- Jointly test ρ1=ρ2=⋯=ρp=0 using an F-test.
- If the null is rejected, autocorrelation of order p is present.
Step 1: Estimate an OLS Regression and Obtain Residuals
# Load necessary libraries
library(lmtest)
library(sandwich)
# Generate some example data
set.seed(123)
n <- 100
x <- rnorm(n)
y <- 1 + 0.5 * x + rnorm(n) # True model: y = 1 + 0.5*x + e
# Estimate the OLS regression
model <- lm(y ~ x)
# Obtain residuals
residuals <- resid(model)
Step 2: Test for Autocorrelation in Residuals
# Create lagged residuals
lagged_residuals <- c(NA, residuals[-length(residuals)])
# Regress residuals on x and lagged residuals
autocorr_test_model <- lm(residuals ~ x + lagged_residuals)
# Summary of the regression
summary(autocorr_test_model)
#>
#> Call:
#> lm(formula = residuals ~ x + lagged_residuals)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.94809 -0.72539 -0.08105 0.58503 3.12941
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.008175 0.098112 0.083 0.934
#> x -0.002841 0.107167 -0.027 0.979
#> lagged_residuals -0.127605 0.101746 -1.254 0.213
#>
#> Residual standard error: 0.9707 on 96 degrees of freedom
#> (1 observation deleted due to missingness)
#> Multiple R-squared: 0.01614, Adjusted R-squared: -0.004354
#> F-statistic: 0.7876 on 2 and 96 DF, p-value: 0.4579
# Test if the coefficient of lagged_residuals is significant
rho <- coef(autocorr_test_model)["lagged_residuals"]
rho_p_value <-
summary(autocorr_test_model)$coefficients["lagged_residuals", "Pr(>|t|)"]
# Decision Rule
if (rho_p_value < 0.05) {
cat("Reject the null hypothesis: There is evidence of serial correlation.\n")
} else {
cat("Fail to reject the null hypothesis: No evidence of serial correlation.\n")
}
#> Fail to reject the null hypothesis: No evidence of serial correlation.
Step 3: Testing for Higher-Order Serial Correlation
# Number of lags to test
p <- 2 # Example: testing for 2nd order autocorrelation
# Create a matrix of lagged residuals
lagged_residuals_matrix <- sapply(1:p, function(i) c(rep(NA, i), residuals[1:(n - i)]))
# Regress residuals on x and lagged residuals
higher_order_autocorr_test_model <- lm(residuals ~ x + lagged_residuals_matrix)
# Summary of the regression
summary(higher_order_autocorr_test_model)
#>
#> Call:
#> lm(formula = residuals ~ x + lagged_residuals_matrix)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.9401 -0.7290 -0.1036 0.6359 3.0253
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.006263 0.099104 0.063 0.950
#> x 0.010442 0.108370 0.096 0.923
#> lagged_residuals_matrix1 -0.140426 0.103419 -1.358 0.178
#> lagged_residuals_matrix2 -0.107385 0.103922 -1.033 0.304
#>
#> Residual standard error: 0.975 on 94 degrees of freedom
#> (2 observations deleted due to missingness)
#> Multiple R-squared: 0.02667, Adjusted R-squared: -0.004391
#> F-statistic: 0.8587 on 3 and 94 DF, p-value: 0.4655
# Joint F-test for the significance of lagged residuals
f_test <- car::linearHypothesis(higher_order_autocorr_test_model,
paste0("lagged_residuals_matrix", 1:p, " = 0"))
# Print the F-test results
print(f_test)
#>
#> Linear hypothesis test:
#> lagged_residuals_matrix1 = 0
#> lagged_residuals_matrix2 = 0
#>
#> Model 1: restricted model
#> Model 2: residuals ~ x + lagged_residuals_matrix
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 96 91.816
#> 2 94 89.368 2 2.4479 1.2874 0.2808
# Decision Rule
if (f_test$`Pr(>F)`[2] < 0.05) {
cat("Reject the null hypothesis: There is evidence of higher-order serial correlation.\n")
} else {
cat("Fail to reject the null hypothesis: No evidence of higher-order serial correlation.\n")
}
#> Fail to reject the null hypothesis: No evidence of higher-order serial correlation.
Corrections for Serial Correlation
If serial correlation is detected, the following adjustments should be made:
Problem | Solution |
---|---|
Mild serial correlation | Use Newey-West standard errors |
Severe serial correlation | Use Generalized Least Squares or Prais-Winsten transformation |
Autoregressive structure in errors | Model as an ARMA process |
Higher-order serial correlation | Include lags of dependent variable or use HAC estimators with higher lag orders |