5 Stationary models
5.1 Univariate Time Series
5.2 Introduction
Following two sections will discuss stationary models in its simplest form.Then we will talk about non-stationary models.
5.3 Autogregressive Models
We will start with the simplest form of time-series model which is called first-order autoregressive models or AR(1).
Specification
A simple way to model dependence between observations in different time periods would be that \(Y_t\) depends linearly on the observation from the previous time period \(Y_{t-1}\).
\[\begin{equation}\label{eq1} Y_t=\alpha+\theta Y_{t-1}+\varepsilon_t \end{equation}\]
Here $ _t $ means serially uncorrelated innovation with mean of zero and a constant variance. The process in is called a first order or \(AR(1)\) process. This process tells us that the value of Y at time \(t\) depends on a constant term plus \(\theta\) times plus an unpredictable component \(\varepsilon\). Here we shall assume \(|\theta|<1\). Now, the unpredictable component \(\varepsilon\) is an important aspect of the whole whole. The process underlying \(\varepsilon_t\) called . Here \(\varepsilon\) will be always homoskedastic and will not show any autocorrelation.
5.3.1 Expected Value of AR(1)
The expected value of \(Y_t\) can be solved from
\[ E(Y_t)=\delta+\theta E(Y_{t-1}) \] which, assuming that \(E(Y_t)\) does not depend upon \(t\), allows us to write
\[\begin{equation*} E(Y_t)=E(Y_{t-1})=\mu \label{} \end{equation*}\] Remember, this is only true if \(Y_t\) is stationary, that means it’s statistical property does not depend on a particular time period, in other words, the mean is constant. Now it follows that
\[\begin{gather*} y_t+\mu=\delta+\theta (y_{t-1}+\mu)+\varepsilon \\ \text{In the above we have seen that} \\ \mu\equiv E(Y_t)=\frac{\delta}{(1-\theta)} \nonumber \\ \text{which means} \\ \delta=\mu (1-\theta) \\ \text{Now putting this value in the above equation} \\ y_t+\mu=\mu (1-\theta)+\theta y_{t-1}+\theta \mu+\varepsilon \\ y_t+\cancel{\mu}=\cancel{\mu}-\mu \theta+\theta y_{t-1}+\theta \mu+\varepsilon \\ y_t=\cancel{\mu \theta}+\theta y_{t-1}+\cancel{\theta \mu}+\varepsilon \\ y_t=\theta y_{t-1}+\varepsilon \end{gather*}\]
Verbeek saying that this version of the equation is more notationally convenient than the original one. I am not sure why he says that. It leaves out the intercept, is tha is the reason? I don’t know. What’s the mean of \(y_t\)?
Taking expectation, we get : \[ E(y_t)=\theta E(y_{t-1})+ E(\varepsilon) \]
Since \(E(\varepsilon)=0\) we can write \[ E(y_t)=\theta E(y_{t-1}) \]
Remember, previously we said that \(Y_t\) is stationary and as a result \(E(Y_t)=E(Y_{t-1})\) Then we can write,
\[\begin{gather*} E(y_t+\mu)=E(y_{t-1}+\mu) \\ E(y_t)+E(\mu)=E(y_{t-1})+E(\mu) \\ \text{Since $\mu$ is a constant, $E(\mu)=\mu$} \\ E(y_t)+\cancel{\mu}=E(y_{t-1})+\cancel{\mu} \\ E(y_t)=E(y_{t-1}) \\ \text{Now from the above, we can write} \\ E(y_t)-\theta E(y_t)=0 \quad \text{since} \quad E(y_t)=E(y_{t-1}) \\ E(y_t)=0 \end{gather*}\]
Above results show that \(y_t\) has a zero mean. But to have a non-zero mean we can add an intercept. We also here note that \(V(Y_t)=V(y_t)\). The process described in imposes certain restrictions on the stochastic process that generates \(Y_t\). Usually when we have a group of variables, we usually describe their joint distribution by covariances. Since, here are lagged version of other variables, we call it . Author says dynamic properties of \(Y_t\) can be derived if we assume that variances and autocovariances does not depend on \(t\). There must be some proof of that but we are not going into that. This is the so called condition. So basically we require stationarity to derive dynamic properties of \(Y_t\). And without deriving dynamic properties of \(Y_t\), we can not forecast its values.
5.3.2 Variance of AR(1)
Previously we derived the constant mean of the distribution by imposing that mean does not depend on \(t\). Now we will do the same thing with variances. First let’s derive the variance:
\[\begin{equation*} \begin{split} Var(Y_t)&=Var(\delta+\theta Y_{t-1}+\varepsilon_t)\\ &=Var(\delta)+Var(\theta Y_{t-1}+\varepsilon_t) \\ &=Var(\theta Y_{t-1}+\varepsilon_t) \quad \because \quad Var(\delta)=0\\ &=\theta^2 Var(Y_{t-1})+Var(\varepsilon_t) \end{split} \end{equation*}\]
Therefore, we can write \[\begin{equation}\label{eqvar1} Var(Y_t)=\theta^2 Var(Y_{t-1})+Var(\varepsilon_t) \end{equation}\]
Here we assume that \(Cov(Y_{t-1} \varepsilon_t)=0\). Now, this is not too unrealistic in the sense that error at current period might not be correlated with the endogenous variable in the past.Now it’s time to impose one of the stationarity condition, namely, variance of the time series process does not depend on time:
\[\begin{equation}\label{eqst} Var(Y_t)=Var(Y_{t-1}) \end{equation}\]
Now using and , we can write
\[\begin{gather} Var(Y_t) =\theta^2 Var(Y_t)+ \sigma^2 \quad \text{by} \quad Var(Y_t)=\sigma^2\\ \text{or,} \quad Var(Y_t)(1-\theta^2) =\sigma^2\\ \text{or,} \quad Var(Y_t) =\frac{\sigma}{1-\theta^2} \label{eqvar2} \end{gather}\]
We see from the equation \(\eqref{eqvar2}\) that \(Var(Y_t)\) is indeed constant. We also see from that we can only impose \(Var(Y_t)=Var(Y_{t+1}\) if \(|\theta|<1\) . This is the assumption we madel previously.This is actually the essence of test which tests whether this coefficient is less than one or not.
5.3.3 Covariances of AR(1)
Now let’s find the \(Covariance\) between \(Y_t\) and \(Y_t{t-1}\).
\[\begin{equation} \begin{split} Cov(Y_t,Y_{t-1}) & =E(Y_t-E(Y_t))E(Y_{t-1}-E(Y_{t-1}))\\ & =E((Y_t-\mu)(Y_{t-1}-\mu))\quad \because \quad E(Y_t)=E(Y_{t-1})=\mu\\ & =E(y_t\, y_{t-1})\quad \text{by the definition of $Y_t$}\\ & =E((\theta y_{t-1}+\varepsilon_t)y_{t-1})\\ & =\theta (E(y_{t-1})^2+E(\varepsilon_t y_{t-1})\\ & =\theta \, E(Y_{t-1}-\mu)^2+E(\varepsilon_t y_{t-1})\\ & =\theta \, Var(Y_t)+E((\varepsilon_t-E(\varepsilon_t)) (Y_{t-1}-\mu))\\ & =\theta \, Var(Y_t)+Cov(\varepsilon_t,Y_t)\\ & =\theta \, Var(Y_t)\quad \because \quad Cov(\varepsilon_t,Y_{t-1})=0 \end{split} \label{eqcov} \end{equation}\]
So, we have established that
\[\begin{equation} Cov(Y_t,Y_{t-1})=\theta \frac{\sigma^2}{1-\theta^2} \label{eqcov2} \end{equation}\]
Now let’s take some higher order lags. Let’s see what would be the covariance between \(Y_t\) and \(Y_{t-2}\), that is autocovariance of lag order 2.
\[\begin{equation} \begin{split} Cov(Y_t,Y_{t-2}) & =E(Y_t-E(Y_t))E(Y_{t-2}-E(Y_{t-2}))\\ & =E((Y_t-\mu)(Y_{t-2}-\mu))\quad \because \quad E(Y_t)=E(Y_{t-2})=\mu\\ & =E(y_t\, y_{t-2})\quad \text{by the definition of $Y_t$}\\ \text{Before continuing we have to figure out $y_t$}\\ y_t & =\theta y_{t-1}+\varepsilon_t\\ & =\theta (\theta y_{t-2}+\varepsilon_{t-1})+\varepsilon_t\\ & =\theta^2 y_{t-2}+\theta \varepsilon_{t-1}+\varepsilon_t\\ \text{Now,putting this value back in equation} & =E((\theta^2 y_{t-2}+\theta \varepsilon_{t-1}+\varepsilon_t)y_{t-2})\\ & =\theta^2 (E(y_{t-2})^2+E(\varepsilon_{t-1} y_{t-2})+E(\varepsilon_t y_{t-2}))\\ & =\theta^2 \, E(Y_{t-2}-\mu)^2+Cov(\varepsilon_{t-1},y_{t-2})+Cov(\varepsilon_t,y_{t-2})\\ & =\theta^2 \, Var(Y_t)\quad \because \quad Cov(\varepsilon_t,Y_{t-k})=0\\ & =\theta^2 \, \frac{\sigma^2}{1-\theta^2}\\ \text{Therefore we see that,} & Cov(Y_t,Y_{t-1}) =\theta \frac{\sigma^2}{1-\theta^2}\\ & Cov(Y_t,Y_{t-2}) =\theta^2 \frac{\sigma^2}{1-\theta^2}\\ \vdots\\ & Cov(Y_t,Y_{t-k}) =\theta^k \frac{\sigma^2}{1-\theta^2} \end{split} \label{eqcovk} \end{equation}\]
We have the following observation from the above Covariance formula:5.4 Moving Average Models
Another very simple time series model is moving average of order 1 or MA(1). This process is given by:
\[\begin{equation} Y_t=\mu+\varepsilon_t+\alpha \varepsilon_{t-1} \label{eqma1} \end{equation}\]
In the equation , \(Y_t\) is sum of constant mean plus weighted average of current and past error. Why this is called weighted average? I am not sure at this point. Usually weighted average is in the form of \(\alpha \varepsilon+(1-\alpha) \varepsilon\) form. But in , it’s not like that. I have to look further into it. Basically the values of \(Y_t\) are defined in terms of drawings from White Noise processes \(\varepsilon_t\).
5.4.1 Mean of MA(1)
Mean of \(MA(1)\) process is pretty simple:
\[\begin{equation} E(Y_t)=\mu \quad \because E(\varepsilon_t)=E(\varepsilon_{t-1})=0 \label{eqmma} \end{equation}\]
5.4.2 Variance of MA(1)
\[\begin{equation} \begin{split} Var(Y_t) & =E[Y_t-E(Y_t)]^2\\ & =E(\cancel{\mu}+\varepsilon_t+\alpha \varepsilon_{t-1}-\cancel{\mu})^2\\ & =E(\varepsilon_t+\alpha \varepsilon_{t-1})^2\\ & =E(\varepsilon_t)^2+\alpha^2 E(\varepsilon_{t-1}^2)\\ & =\sigma^2+\alpha^2 \sigma^2\\ & =\sigma^2(1+\alpha^2) \end{split} \label{} \end{equation}\]
5.4.3 Covariance of MA process
\[\begin{equation} \begin{split} Cov(Y_t,Y_{t-1}) & = E[Y_t-E(Y_t)][Y_{t-1}-E(Y_{t-1})]\\ & = E(\varepsilon_t+\alpha \varepsilon_{t-1})(\varepsilon_{t-1}+\alpha \varepsilon_{t-2})\\ & =\alpha E(\varepsilon_{t-1}^2) \quad \because \quad Cov(\varepsilon_t,\varepsilon_{t-k})=0 \quad \forall \,t \quad \text{when} \quad k\neq 0\\ & = \alpha \sigma^2 \\ Cov(Y_t, Y_{t-2}) & = E[Y_t-E(Y_t)][Y_{t-2}-E(Y_{t-2})]\\ & = E(\varepsilon_t+\alpha \varepsilon_{t-1})(\varepsilon_{t-2}+\alpha \varepsilon_{t-3})\\ & = 0 \quad \because \text{all cross covariance of error terms are zero}\\ Similarly Cov(Y_t,Y_{t-k}) & = 0 \quad \forall \quad k\ge 2 \end{split} \label{eqmacov} \end{equation}\]
The equation above implies that \(AR(1)\) and \(MA(1)\) has very different autocovariance structure.
5.5 Comparing AR(1) and MA(1)
We can generalize \(AR(1)\) and \(MA(1)\) by adding additional lag terms. In general, there is little difference between these two models. We express\(\mathbf{AR(1)}\) as\(\mathbf{MA(1)}\) by repeated substitution.We can rewrite as an infinite order of moving average. We can see this in the following: \[\begin{align} Y_t &=\delta+\theta Y_{t-1}+\varepsilon \\ &=\delta+\theta [\delta+\theta Y_{t-2}+\varepsilon_{t-1}]+\varepsilon_t\\ &=\delta+\theta \delta+\theta^2 Y_{t-2}+\theta \varepsilon_{t-1}+\varepsilon_ti \label{del}\\ \text{We also have previously found that} \mu &=\frac{\delta}{1-\theta}\\ \text{or,}\quad \delta&=\mu (1-\theta)\\ \text{Now putting this back into equation \eqref{del}} &=\mu (1-\theta)+\theta \mu (1-\theta)+\theta^2 Y_{t-2}+\theta \varepsilon_{t-1}+\varepsilon_t\\ &=\mu - \cancel{\mu \theta}+ \cancel{\theta \mu} -\mu \theta^2+\theta^2 Y_{t-2}+\theta \varepsilon_{t-1}+\varepsilon_t\\ &=\mu+\theta^2(Y_{t-2}-\mu)+\theta \varepsilon_{t-1}+\varepsilon_t\\ \text{Similarly, by substituting for $Y_{t-2}$, we get} &=\mu+\theta^3(Y_{t-3}-\mu)+\varepsilon_t+\varepsilon_t+\theta \varepsilon_{t-1}+\theta^2 \varepsilon_{t-2}\\ &\vdots\\ &=\mu+\theta^n(Y_{t-n}-\mu)+\sum_{j=0}^{n-1}\theta^j \varepsilon_{t-j}\\ \label{} \end{align}\]
When \(n\longrightarrow \infty\) and \(\theta <1\) (remember the stationarity condition) , above equation boils down to
\[\begin{equation} Y_t=\mu+\sum_{j=0}^{n-1}\theta^j \varepsilon_{t-j}\\ \label{eqtrar} \end{equation}\] In the same manner, we can try to see whether an process can be transformed into some kind of process. \[\begin{align} MA(1) & =\mu+\varepsilon_t+\varepsilon_{t-1}\\ & = \mu+Y_t-\delta - \theta Y_{t-1}+Y_{t-1}-\delta-\theta Y_{t-2} \\ &=\frac{\delta}{1-\theta}-2 \delta+(1-\theta)Y_{t-1}-\theta Y_{t-2} \\ &=\sigma_0+\sigma_1 Y_{t-1}+\sigma_2 Y_{t-2} \label{} \end{align}\]