11.2 Time Series
yt=β0+xt1β1+xt2β2+...+xt(k−1)βk−1+ϵt
Examples
Static Model
- yt=β0+x1β1+x2β2−x3β3−ϵt
Finite Distributed Lag model
- yt=β0+petδ0+pet−1δ1+pet−2δ2+ϵt
- Long Run Propensity (LRP) is LRP=δ0+δ1+δ2
Dynamic Model
- GDPt=β0+β1GDPt−1−ϵt
Finite Sample Properties for Time Series:
- A1-A3: OLS is unbiased
- A1-A4: usual standard errors are consistent and Gauss-Markov Theorem holds (OLS is BLUE)
- A1-A6, A6: Finite Sample Wald Test (t-test and F-test) are valid
A3 might not hold under time series setting
- Spurious Time Trend - solvable
- Strict vs Contemporaneous Exogeneity - not solvable
In time series data, there are many processes:
- Autoregressive model of order p: AR(p)
- Moving average model of order q: MA(q)
- Autoregressive model of order p and moving average model of order q: ARMA(p,q)
- Autoregressive conditional heteroskedasticity model of order p: ARCH(p)
- Generalized Autoregressive conditional heteroskedasticity of orders p and q; GARCH(p.q)
11.2.1 Deterministic Time trend
Both the dependent and independent variables are trending over time
Spurious Time Series Regression
yt=α0+tα1+vt
and x takes the form
xt=λ0+tλ1+ut
- α1≠0 and λ1≠0
- vt and ut are independent
- there is no relationship between yt and xt
If we estimate the regression,
yt=β0+xtβ1+ϵt
so the true β1=0
- Inconsistent: plim(ˆβ1)=α1λ1
- Invalid Inference: |t|→d∞ for H0:β1=0, will always reject the null as n→∞
- Uninformative R2: plim(R2)=1 will be able to perfectly predict as n→∞
We can rewrite the equation as
yt=β0+β1xt+ϵtϵt=α1t+vt
where β0=α0 and β1=0. Since xt is a deterministic function of time, ϵt is correlated with xt and we have the usual omitted variable bias.
Even when yt and xt are related (β1≠0) but they are both trending over time, we still get spurious results with the simple regression on yt on xt
Solutions to Spurious Trend
Include time trend t as an additional control
- consistent parameter estimates and valid inference
Detrend both dependent and independent variables and then regress the detrended outcome on detrended independent variables (i.e., regress residuals ˆut on residuals ˆvt)
Detrending is the same as partialing out in the [Frisch-Waugh-Lovell Theorem]
- Could allow for non-linear time trends by including t t2, and exp(t)
- Allow for seasonality by including indicators for relevant “seasons” (quarters, months, weeks).
A3 does not hold under:
-
- ϵt influences next period’s independent variables
-
- include last time period outcome as an explanatory variable
-
- For finite distrusted lag model, the number of lags needs to be absolutely correct.
11.2.2 Feedback Effect
yt=β0+xtβ1+ϵt
E(ϵt|X)=E(ϵt|x1,x2,...,xt,xt+1,...,xT)
will not equal 0, because yt will likely influence xt+1,..,xT
- A3 is violated because we require the error to be uncorrelated with all time observation of the independent regressors (strict exogeneity)
11.2.3 Dynamic Specification
yt=β0+yt−1β1+ϵt
E(ϵt|X)=E(ϵt|y1,y2,...,yt,yt+1,...,yT)
will not equal 0, because yt and ϵt are inherently correlated
- A3 is violated because we require the error to be uncorrelated with all time observation of the independent regressors (strict exogeneity)
- Dynamic Specification is not allowed under A3
11.2.4 Dynamically Complete
yt=β0+xtδ0+xt−1δ1+ϵt
E(ϵt|X)=E(ϵt|x1,x2,...,xt,xt+1,...,xT)
will not equal 0, because if we did not include enough lags, xt−2 and ϵt are correlated
- A3 is violated because we require the error to be uncorrelated with all time observation of the independent regressors (strict exogeneity)
- Can be corrected by including more lags (but when stop? )
Without A3
- OLS is biased
- Gauss-Markov Theorem
- Finite Sample Properties are invalid
then, we can
- Focus on Large Sample Properties
- Can use [A3a] instead of A3
[A3a] in time series become
A3a:E(x′tϵt)=0
only the regressors in this time period need to be independent from the error in this time period (Contemporaneous Exogeneity)
- ϵt can be correlated with ...,xt−2,xt−1,xt+1,xt+2,...
- can have a dynamic specification yt=β0+yt−1β1+ϵt
Deriving Large Sample Properties for Time Series
[Weak Law] and Central Limit Theorem depend on A5
- xt and ϵt are dependent over t
- without [Weak Law] or Central Limit Theorem depend on A5, we cannot have Large Sample Properties for OLS
- Instead of A5, we consider [A5a]
Derivation of the Asymptotic Variance depends on A4
- time series setting introduces Serial Correlation: Cov(ϵt,ϵs)≠0
under A1, A2, [A3a], and [A5a], OLS estimator is consistent, and asymptotically normal
11.2.5 Highly Persistent Data
If yt,xt are not weakly dependent stationary process
yt and yt−h are not almost independent for large h
[A5a] does not hold and OLS is not consistent and does not have a limiting distribution.
Example + Random Walk yt=yt−1+ut + Random Walk with a drift: yt=α+yt−1+ut
Solution First difference is a stationary process
yt−yt−1=ut
- If ut is a weakly dependent process (also called integrated of order 0) then yt is said to be difference-stationary process (integrated of order 1)
- For regression, if {yt,xt} are random walks (integrated at order 1), can consistently estimate the first difference equation
yt−yt−1=(xt−xt−1β+ϵt−ϵt−1)Δyt=Δxβ+Δut
Unit Root Test
yt=α+αyt−1+ut
tests if ρ=1 (integrated of order 1)
- Under the null H0:ρ=1, OLS is not consistent or asymptotically normal.
- Under the alternative Ha:ρ<1, OLS is consistent and asymptotically normal.
- usual t-test is not valid, will need to use the transformed equation to produce a valid test.
Dickey-Fuller Test Δyt=α+θyt−1+vt where θ=ρ−1
- H0:θ=0 and Ha:θ<0
- Under the null, Δyt is weakly dependent but yt−1 is not.
- Dickey and Fuller derived the non-normal asymptotic distribution. If you reject the null then yt is not a random walk.
Concerns with the standard Dickey Fuller Test
1. Only considers a fairly simplistic dynamic relationship
Δyt=α+θyt−1+γ1Δt−1+..+γpΔt−p+vt
- with one additional lag, under the null Δyt is an AR(1) process and under the alternative yt is an AR(2) process.
- Solution: include lags of Δyt as controls.
- Does not allow for time trend Δyt=α+θyt−1+δt+vt
- allows yt to have a quadratic relationship with t
- Solution: include time trend (changes the critical values).
Adjusted Dickey-Fuller Test Δyt=α+θyt−1+δt+γ1Δyt−1+...+γpΔyt−p+vt where θ=1−ρ
- H0:θ1=0 and Ha:θ1<0
- Under the null, Δyt is weakly dependent but yt−1 is not
- Critical values are different with the time trend, if you reject the null then yt is not a random walk.
11.2.5.0.1 Newey West Standard Errors
If A4 does not hold, we can use Newey West Standard Errors (HAC - Heteroskedasticity Autocorrelation Consistent)
ˆB=T−1T∑t=1e2tx′txt+g∑h=1(1−hg+1)T−1T∑t=h+1etet−h(x′txt−h+x′t−hxt)
estimates the covariances up to a distance g part
downweights to insure ˆB is PSD
How to choose g:
- For yearly data: g=1 or 2 is likely to account for most of the correlation
- For quarterly or monthly data: g should be larger ($g = 4$ or 8 for quarterly and g=12 or 14 for monthly)
- can also take integer part of 4(T/100)2/9 or integer part of T1/4
Testing for Serial Correlation
Run OLS regression of yt on xt and obtain residuals et
Run OLS regression of et on xt,et−1 and test whether coefficient on et−1 is significant.
Reject the null of no serial correlation if the coefficient is significant at the 5% level.
- Test using heteroskedastic robust standard errors
- can include et−2,et−3,.. in step 2 to test for higher order serial correlation (t-test would now be an F-test of joint significance)