11.3 Time Series Data

Time series data consists of observations on the same variable(s) recorded over multiple time periods for a single entity (or aggregated entity). These data points are typically collected at consistent intervals—hourly, daily, monthly, quarterly, or annually—allowing for the analysis of trends, patterns, and forecasting.

Examples

Stock Market: Daily closing prices of a company’s stock over five years.
Economics: Monthly unemployment rates in a country over a decade.
Macroeconomics: Annual GDP of a country from 1960 to 2020.

Key Characteristics

The primary goal is to analyze trends, seasonality, cyclic patterns, and forecast future values.
Time series data requires specialized statistical methods, such as:
- Autoregressive Integrated Moving Average (ARIMA)
- Seasonal ARIMA (SARIMA)
- Exponential Smoothing
- Vector Autoregression (VAR)

Advantages

Captures temporal patterns such as trends, seasonal fluctuations, and economic cycles.
Essential for forecasting and policy-making, such as setting interest rates based on economic indicators.

Challenges

Autocorrelation: Observations close in time are often correlated.
Structural Breaks: Sudden changes due to policy shifts or economic crises can distort analysis.
Seasonality: Must be accounted for to avoid misleading conclusions.

A time series typically consists of four key components:

Trend: Long-term directional movement in the data over time.
Seasonality: Regular, periodic fluctuations (e.g., increased retail sales in December).
Cyclical Patterns: Long-term economic cycles that are irregular but recurrent.
Irregular (Random) Component: Unpredictable variations not explained by trend, seasonality, or cycles.

A general linear time series model can be expressed as:

$y_t = \beta_0 + x_{t1}\beta_1 + x_{t2}\beta_2 + \dots + x_{t(k-1)}\beta_{k-1} + \epsilon_t$

Some Common Model Types

Static Model

A simple time series regression:

$y_t = \beta_0 + x_1\beta_1 + x_2\beta_2 + x_3\beta_3 + \epsilon_t$

Finite Distributed Lag Model

Captures the effect of past values of an explanatory variable:

$y_t = \beta_0 + pe_t\delta_0 + pe_{t-1}\delta_1 + pe_{t-2}\delta_2 + \epsilon_t$

Long-Run Propensity: Measures the cumulative effect of explanatory variables over time:

$LRP = \delta_0 + \delta_1 + \delta_2$

Dynamic Model

A model incorporating lagged dependent variables:

$GDP_t = \beta_0 + \beta_1 GDP_{t-1} + \epsilon_t$

11.3.1 Statistical Properties of Time Series Models

For time series regression, standard OLS assumptions must be carefully examined. The following conditions affect estimation:

Finite Sample Properties

A1-A3: OLS remains unbiased.
A1-A4: Standard errors are consistent, and the Gauss-Markov Theorem holds (OLS is BLUE).
A1-A6: Finite sample Wald tests (e.g., t-tests and F-tests) remain valid.

However, in time series settings, A3 often fails due to:

Spurious Time Trends (fixable by including a time trend)
Strict vs. Contemporaneous Exogeneity (sometimes unavoidable)

11.3.2 Common Time Series Processes

Several key models describe different time series behaviors:

Autoregressive Model (AR(p)): A process where current values depend on past values.
Moving Average Model (MA(q)): A process where past error terms influence current values.
Autoregressive Moving Average (ARMA(p, q)): A combination of AR and MA processes.
Autoregressive Conditional Heteroskedasticity (ARCH(p)): Models time-varying volatility.
Generalized ARCH (GARCH(p, q)): Extends ARCH by including past conditional variances.

11.3.3 Deterministic Time Trends

When both the dependent and independent variables exhibit trending behavior, a regression may produce spurious results.

Spurious Regression Example

A simple regression with trending variables:

$y_t = \alpha_0 + t\alpha_1 + v_t$

$x_t = \lambda_0 + t\lambda_1 + u_t$

where

$\alpha_1 \neq 0$ and $\lambda_1 \neq 1$
$v_t$ and $u_t$ are independent.

Despite no true relationship between $x_t$ and $y_t$ , estimating:

$y_t = \beta_0 + x_t\beta_1 + \epsilon_t$

results in:

Inconsistency: $plim(\hat{\beta}_1) = \frac{\alpha_1}{\lambda_1}$
Invalid Inference: $|t| \to^d \infty$ for $H_0: \beta_1=0$ , leading to rejection of the null hypothesis as $n \to \infty$ .
Misleading $R^2$ : $plim(R^2) = 1$ , falsely implying perfect predictive power.

We can also rewrite the equation as:

$\begin{aligned} y_t &=\beta_0 + \beta_1 x_t + \epsilon_t \\ \epsilon_t &= \alpha_1 t + v_t \end{aligned}$

where $\beta_0 = \alpha_0$ and $\beta_1 = 0$ . Since $x_t$ is a deterministic function of time, $\epsilon_t$ is correlated with $x_t$ , leading to the usual omitted variable bias.

Solutions to Spurious Trends

Include a Time Trend ( $t$ ) as a Control Variable
- Provides consistent parameter estimates and valid inference.
Detrend Variables
- Regress both $y_t$ and $x_t$ on time, then use residuals in a second regression.
- Equivalent to applying the Frisch-Waugh-Lovell Theorem.

11.3.4 Violations of Exogeneity in Time Series Models

The exogeneity assumption (A3) plays a crucial role in ensuring unbiased and consistent estimation in time series models. However, in many cases, the assumption is violated due to the inherent nature of time-dependent processes.

In a standard regression framework, we assume:

$E(\epsilon_t | x_1, x_2, ..., x_T) = 0$

which requires that the error term is uncorrelated with all past, present, and future values of the independent variables.

Common Violations of Exogeneity

Feedback Effect

The error term $\epsilon_t$ influences future values of the independent variables.
A classic example occurs in economic models where past shocks affect future decisions.

Dynamic Specification

The dependent variable includes a lagged version of itself as an explanatory variable, introducing correlation between $\epsilon_t$ and past $y_{t-1}$ .

Dynamic Completeness

In finite distributed lag (FDL) models, failing to include the correct number of lags leads to omitted variable bias and correlation between regressors and errors.

11.3.4.1 Feedback Effect

In a simple regression model:

$y_t = \beta_0 + x_t \beta_1 + \epsilon_t$

the standard exogeneity assumption (A3) requires:

$E(\epsilon_t | x_1, x_2, ..., x_t, x_{t+1}, ..., x_T) = 0$

However, in the presence of feedback, past errors affect future values of $x_t$ , leading to:

$E(\epsilon_t | x_{t+1}, ..., x_T) \neq 0$

This occurs when current shocks (e.g., economic downturns) influence future decisions (e.g., government spending, firm investments).
Strict exogeneity is violated, as we now have dependence across time.

Implication:

Standard OLS estimators become biased and inconsistent.
One common solution is using Instrumental Variables to isolate exogenous variation in $x_t$ .

11.3.4.2 Dynamic Specification

A dynamically specified model includes lagged dependent variables:

$y_t = \beta_0 + y_{t-1} \beta_1 + \epsilon_t$

Exogeneity (A3) would require:

$E(\epsilon_t | y_1, y_2, ..., y_t, y_{t+1}, ..., y_T) = 0$

However, since $y_{t-1}$ depends on $\epsilon_{t-1}$ from the previous period, we obtain:

$Cov(y_{t-1}, \epsilon_t) \neq 0$

Implication:

Strict exogeneity (A3) fails, as $y_{t-1}$ and $\epsilon_t$ are correlated.
OLS estimates are biased and inconsistent.
Standard autoregressive models (AR) require alternative estimation techniques like Generalized Method of Moments or Maximum Likelihood Estimation.

11.3.4.3 Dynamic Completeness and Omitted Lags

A finite distributed lag (FDL) model:

$y_t = \beta_0 + x_t \delta_0 + x_{t-1} \delta_1 + \epsilon_t$

assumes that the included lags fully capture the relationship between $y_t$ and past values of $x_t$ . However, if we omit relevant lags, the exogeneity assumption (A3):

$E(\epsilon_t | x_1, x_2, ..., x_t, x_{t+1}, ..., x_T) = 0$

fails, as unmodeled lag effects create correlation between $x_{t-2}$ and $\epsilon_t$ .

Implication:

The regression suffers from omitted variable bias, making OLS estimates unreliable.
Solution:
- Include additional lags of $x_t$ .
- Use lag selection criteria (e.g., AIC, BIC) to determine the appropriate lag structure.

11.3.5 Consequences of Exogeneity Violations

If strict exogeneity (A3) fails, standard OLS assumptions no longer hold:

OLS is biased.
Gauss-Markov Theorem no longer applies.
Finite Sample Properties (such as unbiasedness) are invalid.

To address these issues, we can:

Rely on Large Sample Properties: Under certain conditions, consistency may still hold.
Use Weaker Forms of Exogeneity: Shift from strict exogeneity (A3) to contemporaneous exogeneity (A3a).

If strict exogeneity does not hold, we can instead assume A3a (Contemporaneous Exogeneity):

$E(\mathbf{x}_t' \epsilon_t) = 0$

This weaker assumption only requires that $x_t$ is uncorrelated with the error in the same time period.

Key Differences from Strict Exogeneity
Exogeneity Type	Requirement	Allows Dynamic Models?
Strict Exogeneity	$E(\epsilon_t \| x_1, x_2, ..., x_T) = 0$	No
Contemporaneous Exogeneity	$E(\mathbf{x}_t' \epsilon_t) = 0$	Yes

With contemporaneous exogeneity, $\epsilon_t$ can be correlated with past and future values of $x_t$ .
This allows for dynamic specifications such as:

$y_t = \beta_0 + y_{t-1} \beta_1 + \epsilon_t$

while still maintaining consistency under certain assumptions.

Deriving Large Sample Properties for Time Series

To establish consistency and asymptotic normality, we rely on the following assumptions:

A1: Linearity
A2: Full Rank (No Perfect Multicollinearity)
A3a: Contemporaneous Exogeneity

However, the standard Weak Law of Large Numbers and Central Limit Theorem in OLS depend on A5 (Random Sampling), which does not hold in time series settings.

Since time series data exhibits dependence over time, we replace A5 (Random Sampling) with a weaker assumption:

A5a: Weak Dependence (Stationarity)

Asymptotic Variance and Serial Correlation

The derivation of asymptotic variance depends on A4 (Homoskedasticity).
However, in time series settings, we often encounter serial correlation:

$Cov(\epsilon_t, \epsilon_s) \neq 0 \quad \text{for} \quad |t - s| > 0$
To ensure valid inference, standard errors must be corrected using methods such as Newey-West HAC estimators.

11.3.6 Highly Persistent Data

In time series analysis, a key assumption for OLS consistency is that the data-generating process exhibits A5a weak dependence (i.e., observations are not too strongly correlated over time). However, when $y_t$ and $x_t$ are highly persistent, standard OLS assumptions break down.

If a time series is not weakly dependent, it means:

$y_t$ and $y_{t-h}$ remain strongly correlated even for large lags ( $h \to \infty$ ).
A5a (Weak Dependence Assumption) fails, leading to:
- OLS inconsistency.
- No valid limiting distribution (asymptotic normality does not hold).

Example: A classic example of a highly persistent process is a random walk:

$y_t = y_{t-1} + u_t$

or with drift:

$y_t = \alpha + y_{t-1} + u_t$

where $u_t$ is a white noise error term.

$y_t$ does not revert to a mean—it has an infinite variance as $t \to \infty$ .
Shocks accumulate, making standard regression analysis unreliable.

11.3.6.1 Solution: First Differencing

A common way to transform non-stationary series into stationary ones is through first differencing:

$\Delta y_t = y_t - y_{t-1} = u_t$

If $u_t$ is a weakly dependent process (i.e., $I(0)$ , stationary), then $y_t$ is said to be difference-stationary or integrated of order 1, $I(1)$ .
If both $y_t$ and $x_t$ follow a random walk ( $I(1)$ ), we estimate:

$\begin{aligned} \Delta y_t &= (\Delta \mathbf{x}_t \beta) + (\epsilon_t - \epsilon_{t-1}) \\ \Delta y_t &= \Delta \mathbf{x}_t \beta + \Delta u_t \end{aligned}$

This ensures OLS estimation remains valid.

11.3.7 Unit Root Testing

To formally determine whether a time series contains a unit root (i.e., is non-stationary), we test:

$y_t = \alpha + \rho y_{t-1} + u_t$

Hypothesis Testing

$H_0: \rho = 1$ (unit root, non-stationary)
- OLS is not consistent or asymptotically normal.
$H_a: \rho < 1$ (stationary process)
- OLS is consistent and asymptotically normal.

Key Issues

The usual t-test is not valid because OLS under $H_0$ does not have a standard distribution.
Instead, specialized tests such as Dickey-Fuller and Augmented Dickey-Fuller tests are required.

11.3.7.1 Dickey-Fuller Test for Unit Roots

The Dickey-Fuller test transforms the original equation by subtracting $y_{t-1}$ from both sides:

$\Delta y_t = \alpha + \theta y_{t-1} + v_t$

where:

$\theta = \rho - 1$

Null Hypothesis ( $H_0: \theta = 0$ ) → Implies $\rho = 1$ (unit root, non-stationary).
Alternative ( $H_a: \theta < 0$ ) → Implies $\rho < 1$ (stationary).

Since $y_t$ follows a non-standard asymptotic distribution under $H_0$ , Dickey and Fuller derived specialized critical values.

Decision Rule

If the test statistic is more negative than the critical value, reject $H_0$ → $y_t$ is stationary.
Otherwise, fail to reject $H_0$ → $y_t$ has a unit root (non-stationary).

The standard DF test may fail due to two key limitations:

Simplistic Dynamic Relationship

The DF test assumes only one lag in the autoregressive structure.
However, in reality, higher-order lags of $\Delta y_t$ may be needed.

Solution:
Use the Augmented Dickey-Fuller test, which includes extra lags:

$\Delta y_t = \alpha + \theta y_{t-1} + \gamma_1 \Delta y_{t-1} + \dots + \gamma_p \Delta y_{t-p} + v_t$

Under $H_0$ , $\Delta y_t$ follows an AR(1) process.
Under $H_a$ , $y_t$ follows an AR(2) or higher process.

Including lags of $\Delta y_t$ ensures a better-specified model.

Ignoring Deterministic Time Trends

If a series exhibits a deterministic trend, failing to include it biases the unit root test.

Example: If $y_t$ grows over time, a test without a trend component will falsely detect a unit root.

Solution: Include a deterministic time trend ( $t$ ) in the regression:

$\Delta y_t = \alpha + \theta y_{t-1} + \delta t + v_t$

Allows for quadratic relationships with time.
Changes the critical values, requiring an adjusted statistical test.

11.3.7.2 Augmented Dickey-Fuller Test

The ADF test generalizes the DF test by allowing for:

Lags of $\Delta y_t$ (to correct for serial correlation).
Time trends (to handle deterministic trends).

Regression Equation

$\Delta y_t = \alpha + \theta y_{t-1} + \delta t + \gamma_1 \Delta y_{t-1} + \dots + \gamma_p \Delta y_{t-p} + v_t$

where $\theta = 1 - \rho$ .

Hypotheses

$H_0: \theta = 0$ (Unit root: non-stationary)
$H_a: \theta < 0$ (Stationary)

11.3.8 Newey-West Standard Errors

Newey-West standard errors, also known as Heteroskedasticity and Autocorrelation Consistent (HAC) estimators, provide valid inference when errors exhibit both heteroskedasticity (i.e., A4 Homoskedasticity assumption is violated) and serial correlation. These standard errors adjust for dependence in the error structure, ensuring that hypothesis tests remain valid.

Key Features

Accounts for autocorrelation: Handles time dependence in error terms.
Accounts for heteroskedasticity: Allows for non-constant variance across observations.
Ensures positive semi-definiteness: Downweights longer-lagged covariances to maintain mathematical validity.

The estimator is computed as:

$\hat{B} = T^{-1} \sum_{t=1}^{T} e_t^2 \mathbf{x'_t x_t} + \sum_{h=1}^{g} \left(1 - \frac{h}{g+1} \right) T^{-1} \sum_{t=h+1}^{T} e_t e_{t-h} (\mathbf{x_t' x_{t-h}} + \mathbf{x_{t-h}' x_t})$

where:

$T$ is the sample size,
$g$ is the chosen lag truncation parameter (bandwidth),
$e_t$ are the residuals from the OLS regression,
$\mathbf{x}_t$ are the explanatory variables.

Choosing the Lag Length ( $g$ )

Selecting an appropriate lag truncation parameter ( $g$ ) is crucial for balancing efficiency and bias. Common guidelines include:

Yearly data: $g = 1$ or $2$ usually suffices.
Quarterly data: $g = 4$ or $8$ accounts for seasonal dependencies.
Monthly data: $g = 12$ or $14$ captures typical cyclical effects.

Alternatively, data-driven methods can be used:

Newey-West Rule: $g = \lfloor 4(T/100)^{2/9} \rfloor$
Alternative Heuristic: $g = \lfloor T^{1/4} \rfloor$

# Load necessary libraries
library(sandwich)
library(lmtest)

# Simulate data
set.seed(42)
T <- 100  # Sample size
time <- 1:T
x <- rnorm(T)
epsilon <- arima.sim(n = T, list(ar = 0.5))  # Autocorrelated errors
y <- 2 + 3 * x + epsilon  # True model

# Estimate OLS model
model <- lm(y ~ x)

# Compute Newey-West standard errors
lag_length <- floor(4 * (T / 100) ^ (2 / 9))  # Newey-West rule
nw_se <- NeweyWest(model, lag = lag_length, prewhite = FALSE)

# Display robust standard errors
coeftest(model, vcov = nw_se)
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value  Pr(>|t|)    
#> (Intercept)  1.71372    0.13189  12.993 < 2.2e-16 ***
#> x            3.15831    0.13402  23.567 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

11.3.8.1 Testing for Serial Correlation

Serial correlation (also known as autocorrelation) occurs when error terms are correlated across time:

$E(\epsilon_t \epsilon_{t-h}) \neq 0 \quad \text{for some } h \neq 0$

Steps for Detecting Serial Correlation

Estimate an OLS regression:
- Run the regression of $y_t$ on $\mathbf{x}_t$ and obtain residuals $e_t$ .
Test for autocorrelation in residuals:
- Regress $e_t$ on $\mathbf{x}_t$ and its lagged residual $e_{t-1}$ :
  
  $e_t = \gamma_0 + \mathbf{x}_t' \gamma + \rho e_{t-1} + v_t$
- Test whether $\rho$ is significantly different from zero.
Decision Rule:
- If $\rho$ is statistically significant at the 5% level, reject the null hypothesis of no serial correlation.

Higher-Order Serial Correlation

To test for higher-order autocorrelation, extend the previous regression:

$e_t = \gamma_0 + \mathbf{x}_t' \gamma + \rho_1 e_{t-1} + \rho_2 e_{t-2} + \dots + \rho_p e_{t-p} + v_t$

Jointly test $\rho_1 = \rho_2 = \dots = \rho_p = 0$ using an F-test.
If the null is rejected, autocorrelation of order $p$ is present.

Step 1: Estimate an OLS Regression and Obtain Residuals

# Load necessary libraries
library(lmtest)
library(sandwich)

# Generate some example data
set.seed(123)
n <- 100
x <- rnorm(n)
y <- 1 + 0.5 * x + rnorm(n)  # True model: y = 1 + 0.5*x + e

# Estimate the OLS regression
model <- lm(y ~ x)

# Obtain residuals
residuals <- resid(model)

Step 2: Test for Autocorrelation in Residuals

# Create lagged residuals
lagged_residuals <- c(NA, residuals[-length(residuals)])

# Regress residuals on x and lagged residuals
autocorr_test_model <- lm(residuals ~ x + lagged_residuals)

# Summary of the regression
summary(autocorr_test_model)
#> 
#> Call:
#> lm(formula = residuals ~ x + lagged_residuals)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.94809 -0.72539 -0.08105  0.58503  3.12941 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value Pr(>|t|)
#> (Intercept)       0.008175   0.098112   0.083    0.934
#> x                -0.002841   0.107167  -0.027    0.979
#> lagged_residuals -0.127605   0.101746  -1.254    0.213
#> 
#> Residual standard error: 0.9707 on 96 degrees of freedom
#>   (1 observation deleted due to missingness)
#> Multiple R-squared:  0.01614,    Adjusted R-squared:  -0.004354 
#> F-statistic: 0.7876 on 2 and 96 DF,  p-value: 0.4579

# Test if the coefficient of lagged_residuals is significant
rho <- coef(autocorr_test_model)["lagged_residuals"]
rho_p_value <-
    summary(autocorr_test_model)$coefficients["lagged_residuals", "Pr(>|t|)"]

# Decision Rule
if (rho_p_value < 0.05) {
    cat("Reject the null hypothesis: There is evidence of serial correlation.\n")
} else {
    cat("Fail to reject the null hypothesis: No evidence of serial correlation.\n")
}
#> Fail to reject the null hypothesis: No evidence of serial correlation.

Step 3: Testing for Higher-Order Serial Correlation

# Number of lags to test
p <- 2  # Example: testing for 2nd order autocorrelation

# Create a matrix of lagged residuals
lagged_residuals_matrix <- sapply(1:p, function(i) c(rep(NA, i), residuals[1:(n - i)]))

# Regress residuals on x and lagged residuals
higher_order_autocorr_test_model <- lm(residuals ~ x + lagged_residuals_matrix)

# Summary of the regression
summary(higher_order_autocorr_test_model)
#> 
#> Call:
#> lm(formula = residuals ~ x + lagged_residuals_matrix)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.9401 -0.7290 -0.1036  0.6359  3.0253 
#> 
#> Coefficients:
#>                           Estimate Std. Error t value Pr(>|t|)
#> (Intercept)               0.006263   0.099104   0.063    0.950
#> x                         0.010442   0.108370   0.096    0.923
#> lagged_residuals_matrix1 -0.140426   0.103419  -1.358    0.178
#> lagged_residuals_matrix2 -0.107385   0.103922  -1.033    0.304
#> 
#> Residual standard error: 0.975 on 94 degrees of freedom
#>   (2 observations deleted due to missingness)
#> Multiple R-squared:  0.02667,    Adjusted R-squared:  -0.004391 
#> F-statistic: 0.8587 on 3 and 94 DF,  p-value: 0.4655

# Joint F-test for the significance of lagged residuals
f_test <- car::linearHypothesis(higher_order_autocorr_test_model, 
                           paste0("lagged_residuals_matrix", 1:p, " = 0"))

# Print the F-test results
print(f_test)
#> 
#> Linear hypothesis test:
#> lagged_residuals_matrix1 = 0
#> lagged_residuals_matrix2 = 0
#> 
#> Model 1: restricted model
#> Model 2: residuals ~ x + lagged_residuals_matrix
#> 
#>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
#> 1     96 91.816                           
#> 2     94 89.368  2    2.4479 1.2874 0.2808

# Decision Rule
if (f_test$`Pr(>F)`[2] < 0.05) {
  cat("Reject the null hypothesis: There is evidence of higher-order serial correlation.\n")
} else {
  cat("Fail to reject the null hypothesis: No evidence of higher-order serial correlation.\n")
}
#> Fail to reject the null hypothesis: No evidence of higher-order serial correlation.

Corrections for Serial Correlation

If serial correlation is detected, the following adjustments should be made:

Strategies for Addressing Serial Correlation in Regression Models
Problem	Solution
Mild serial correlation	Use Newey-West standard errors
Severe serial correlation	Use Generalized Least Squares or Prais-Winsten transformation
Autoregressive structure in errors	Model as an ARMA process
Higher-order serial correlation	Include lags of dependent variable or use HAC estimators with higher lag orders