4.1 Stochastic Processes
A discrete-time stochastic process or time series process {…,Y1,Y2,…,Yt,Yt+1,…}={Yt}∞t=−∞, is a sequence of random variables indexed by time t17. In most applications, the time index is a regularly spaced index representing calendar time (e.g., days, months, years, etc.) but it can also be irregularly spaced representing event time (e.g., intra-day transaction times). In modeling time series data, the ordering imposed by the time index is important because we often would like to capture the temporal relationships, if any, between the random variables in the stochastic process. In random sampling from a population, the ordering of the random variables representing the sample does not matter because they are independent.
A realization of a stochastic process with T observations is the sequence of observed data {Y1=y1,Y2=y2,…,YT=yT}={yt}Tt=1. The goal of time series modeling is to describe the probabilistic behavior of the underlying stochastic process that is believed to have generated the observed data in a concise way. In addition, we want to be able to use the observed sample to estimate important characteristics of a time series model such as measures of time dependence. In order to do this, we need to make a number of assumptions regarding the joint behavior of the random variables in the stochastic process such that we may treat the stochastic process in much the same way as we treat a random sample from a given population.
4.1.1 Stationary stochastic processes
We often describe random sampling from a population as a sequence of independent, and identically distributed (iid) random variables X1,X2… such that each Xi is described by the same probability distribution FX, and write Xi∼FX. With a time series process, we would like to preserve the identical distribution assumption but we do not want to impose the restriction that each random variable in the sequence is independent of all of the other variables. In many contexts, we would expect some dependence between random variables close together in time (e.g, X1, and X2) but little or no dependence between random variables far apart in time (e.g., X1 and X100). We can allow for this type of behavior using the concepts of stationarity and ergodicity.
We start with the definition of strict stationarity.
A stochastic process {Yt} is strictly stationary if, for any given finite integer r and for any set of subscripts t1,t2,…,tr the joint distribution of (Yt1,Yt2,…,Ytr) depends only on t1−t,t2−t,…,tr−t but not on t. In other words, the joint distribution of (Yt1,Yt2,…,Ytr) is the same as the joint distribution of (Yt1−t,Yt2−t,…,Ytr−t) for any value of t.
In simple terms, the joint distribution of random variables in a strictly stationary stochastic process is time invariant. For example, the joint distribution of (Y1,Y5,Y7) is the same as the distribution of (Y12,Y16,Y18). Just like in an iid sample, in a strictly stationary process all of the individual random variables Yt (t=−∞,…,∞) have the same marginal distribution FY. This means they all have the same mean, variance etc., assuming these quantities exist. However, assuming strict stationarity does not make any assumption about the correlations between Yt,Yt1,…,Ytr other than that the correlation between Yt and Ytr only depends on t−tr (the time between Yt and Ytr) and not on t. That is, strict stationarity allows for general temporal dependence between the random variables in the stochastic process.
A useful property of strict stationarity is that it is preserved under general transformations, as summarized in the following proposition.
Proposition 4.1 Let {Yt} be strictly stationary and let g(⋅) be any function of the elements in {Yt}. Then {g(Yt)}, is also strictly stationary.
For example, if {Yt} is strictly stationary then {Y2t} and {YtYt−1} are also strictly stationary.
The following are some simple examples of strictly stationary processes.
If {Yt} is an iid sequence, then it is strictly stationary.
◼
Let {Yt} be an iid sequence and let X∼N(0,1) independent of {Yt}. Define Zt=Yt+X. The sequence {Zt} is not an independent sequence (because of the common X) but is an identically distributed sequence and is strictly stationary.
◼
Strict stationarity places very strong restrictions on the behavior of a time series. A related concept that imposes weaker restrictions and is convenient for time series model building is covariance stationarity (sometimes called weak stationarity).
A stochastic process {Yt} is covariance stationary if
- E[Yt]=μ<∞ does not depend on t
- var(Yt)=σ2<∞ does not depend on t
- cov(Yt,Yt−j)=γj<∞, and depends only on j but not on t for j=0,1,2,…
The term γj is called the jth order autocovariance. The jth order autocorrelation is defined as: ρj=cov(Yt,Yt−j)√var(Yt)var(Yt−j)=γjσ2.
The autocovariances, γj, measure the direction of linear dependence between Yt and Yt−j. The autocorrelations, ρj, measure both the direction and strength of linear dependence between Yt and Yt−j. With covariance stationarity, instead of assuming the entire joint distribution of a collection of random variables is time invariant we make a weaker assumption that only the mean, variance and autocovariances of the random variables are time invariant. A strictly stationary stochastic process {Yt} such that μ, σ2, and γij exist is a covariance stationary stochastic process. However, a covariance stationary process need not be strictly stationary.
The autocovariances and autocorrelations are measures of the linear temporal dependence in a covariance stationary stochastic process. A graphical summary of this temporal dependence is given by the plot of ρj against j, and is called the autocorrelation function (ACF). Figure 4.1 illustrates an ACF for a hypothetical covariance stationary time series with ρj=(0.9)j for j=1,2,…,10 created with
0.9
rho = (rho)^(1:10)
rhoVec =ts.plot(rhoVec, type="h", lwd=2, col="blue", xlab="Lag j",
ylab=expression(rho[j]))

Figure 4.1: ACF for time series with ρj=(0.9)j
For this process the strength of linear time dependence decays toward zero geometrically fast as j increases.
The definition of covariance stationarity requires that E[Yt]<∞ and var(Yt)<∞. That is, E[Yt] and var(Yt) must exist and be finite numbers. This is true if Yt is normally distributed. However, it is not true if, for example, Yt has a Student’s t distribution with one degree of freedom.18 Hence, a strictly stationary stochastic process {Yt} where the (marginal) pdf of Yt (for all t) has very fat tails may not be covariance stationary.
Let Yt∼iid N(0,σ2). Then {Yt} is called a Gaussian white noise (GWN) process and is denoted Yt∼GWN(0,σ2). Notice that: E[Yt]=0 independent of t,var(Yt)=σ2 independent of t,cov(Yt,Yt−j)=0 (for j>0) independent of t for all j, so that {Yt} satisfies the properties of a covariance stationary process. The defining characteristic of a GWN process is the lack of any predictable pattern over time in the realized values of the process. The term white noise comes from the electrical engineering literature and represents the absence of any signal.19
Simulating observations from a GWN process in R is easy: just simulate iid observations from a normal distribution. For example, to simulate T=250 observations from the GWN(0,1) process use:
set.seed(123)
rnorm(250) y =
The simulated iid N(0,1) values are generated using the rnorm()
function. The command set.seed(123)
initializes R’s internal
random number generator using the seed value 123. Every time the random
number generator seed is set to a particular value, the random number
generator produces the same set of random numbers. This allows different
people to create the same set of random numbers so that results are
reproducible. The simulated data is illustrated in Figure 4.2
created using:
ts.plot(y,main="Gaussian White Noise Process",xlab="time",
ylab="y(t)", col="blue", lwd=2)
abline(h=0)

Figure 4.2: Realization of a GWN(0,1) process.
The function ts.plot()
creates a time series line plot with
a dummy time index. An equivalent plot can be created using the generic
plot()
function with optional argument type="l"
.
The data in Figure 4.2 fluctuate
randomly about the mean value zero and exhibit a constant volatility
of one (typical magnitude of a fluctuation about zero). There is no
visual evidence of any predictable pattern in the data.
◼
Let rt denote the continuously compounded monthly return on Microsoft stock and assume that rt∼iidN(0.01,(0.05)2). We can represent this distribution in terms of a GWN process as follows rt=0.01+εt,εt∼GWN(0,(0.05)2). Here, rt is constructed as a GWN(0,σ2) process plus a constant. Hence, {rt} is a GWN process with a non-zero mean: rt∼GWN(0.01,(0.05)2). T=60 simulated values of {rt} are computed using:
set.seed(123)
rnorm(60, mean=0.01, sd=0.05)
y =ts.plot(y,main="GWN Process for Monthly Continuously Compounded Returns",
xlab="time",ylab="r(t)", col="blue", lwd=2)
abline(h=c(0,0.01,-0.04,0.06), lwd=2,
lty=c("solid","solid","dotted","dotted"),
col=c("black", "red", "red", "red"))

Figure 4.3: Simulated returns from GWN(0.01,(0.05)2).
and are illustrated in Figure 4.3. Notice that the returns fluctuate around the mean value of 0.01 (solid red line) and the size of a typical deviation from the mean is about 0.05. The red dotted lines show the values 0.1±0.05 .
An implication of the GWN assumption for monthly returns is that non-overlapping multi-period returns are also GWN. For example, consider the two-month return rt(2)=rt+rt−1. The non-overlapping process {rt(2)}={...,rt−2(2),rt(2),rt+2(2),...} is GWN with mean E[rt(2)]=2⋅μ=0.02, variance var(rt(2))=2⋅σ2=0.005, and standard deviation sd(rt(2))=√2σ=0.071.
◼
Let Yt∼iid (0,σ2). Then {Yt} is called an independent white noise (IWN) process and is denoted Yt∼IWN(0,σ2). The difference between GWN and IWN is that with IWN we don’t specify that all random variables are normally distributed. The random variables can have any distribution with mean zero and variance σ2. To illustrate, suppose Yt=1√3×t3 where t3 denotes a Student’s t distribution with 3 degrees of freedom. This process has E[Yt]=0 and var(Yt)=1. Figure 4.4 shows simulated observations from this process created using the R commands
set.seed(123)
(1/sqrt(3))*rt(250, df=3)
y =ts.plot(y, main="Independent White Noise Process", xlab="time", ylab="y(t)",
col="blue", lwd=2)
abline(h=0)

Figure 4.4: Simulation of IWN(0,1) process: Yt∼1√3×t3
The simulated IWN process resembles the GWN in Figure 4.4 but has more extreme observations.
◼
Let {Yt} be a sequence of uncorrelated random variables each with mean zero and variance σ2. Then {Yt} is called a weak white noise (WWN) process and is denoted Yt∼WWN(0,σ2). With a WWN process, the random variables are not independent, only uncorrelated. This allows for potential non-linear temporal dependence between the random variables in the process.
◼
4.1.2 Non-Stationary processes
In a covariance stationary stochastic process it is assumed that the means, variances and autocovariances are independent of time. In a non-stationary process, one or more of these assumptions is not true. The following examples illustrate some typical non-stationary time series processes.
Suppose {Yt} is generated according to the deterministically trending process: Yt=β0+β1t+εt, εt∼GWN(0,σ2ε),t=0,1,2,… Then {Yt} is nonstationary because the mean of Yt depends on t: E[Yt]=β0+β1t. Figure 4.5 shows a realization of this process with β0=0,β1=0.1 and σ2ε=1 created using the R commands:
set.seed(123)
rnorm(250)
e = 0.1*seq(1,250) + e
y.dt =ts.plot(y.dt, lwd=2, col="blue", main="Deterministic Trend + Noise")
abline(a=0, b=0.1)

Figure 4.5: Deterministically trending nonstationary process Yt=0.1×t+εt,εt∼GWN(0,1)
Here the non-stationarity is created by the deterministic trend β0+β1t in the data. The non-stationary process {Yt} can be transformed into a stationary process by simply subtracting off the trend: Xt=Yt−β0−β1t=εt∼GWN(0,σ2ε). The detrended process Xt∼GWN(0,σ2ε) is obviously covariance stationary.
◼
A random walk (RW) process {Yt} is defined as: Yt=Yt−1+εt, εt∼GWN(0,σ2ε),Y0 is fixed (non-random). By recursive substitution starting at t=1, we have: Y1=Y0+ε1,Y2=Y1+ε2=Y0+ε1+ε2,⋮Yt=Yt−1+εt=Y0+ε1+⋯+εt=Y0+t∑j=1εj. Now, E[Yt]=Y0 which is independent of t. However, var(Yt)=var(t∑j=1εj)=t∑j=1σ2ε=σ2ε×t, which depends on t, and so {Yt} is not stationary.
Figure 4.6 shows a realization of the RW process with Y0=0 and σ2ε=1 created using the R commands:
set.seed(321)
rnorm(250)
e = cumsum(e)
y.rw =ts.plot(y.rw, lwd=2, col="blue", main="Random Walk")
abline(h=0)

Figure 4.6: Random walk process: Yt=Yt−1+εt,εt∼GWN(0,1).
The RW process looks much different from the GWN process in Figure 4.2. As the variance of the process increases linearly with time, the uncertainty about where the process will be at a given point in time increases with time.
Although {Yt} is non-stationary, a simple first-differencing transformation, however, yields a covariance stationary process: Xt=Yt−Yt−1=εt∼GWN(0,σ2ε).
◼
Let rt denote the continuously compounded monthly return on Microsoft stock and assume that rt∼GWN(μ,σ2). Since rt=ln(Pt/Pt−1) it follows that lnPt=lnPt−1+rt. Now, re-express rt as rt=μ+εt where εt∼GWN(0,σ2). Then lnPt=lnPt−1+μ+εt. By recursive substitution we have lnPt=lnP0+μt+∑tt=1εt and so lnPt follows a random walk process with drift value μ. Here, E[lnPt]=μt and var(lnPt)=σ2t so lnPt is non-stationary because both the mean and variance depend on t. In this model, prices, however, do not follow a random walk since Pt=elnPt=Pt−1ert.
◼
4.1.3 Ergodicity
In a strictly stationary or covariance stationary stochastic process no assumption is made about the strength of dependence between random variables in the sequence. For example, in a covariance stationary stochastic process it is possible that ρ1=cor(Yt,Yt−1)=ρ100=cor(Yt,Yt−100)=0.5, say. However, in many contexts it is reasonable to assume that the strength of dependence between random variables in a stochastic process diminishes the farther apart they become. That is, ρ1>ρ2⋯ and that eventually ρj=0 for j large enough. This diminishing dependence assumption is captured by the concept of ergodicity.
Intuitively, a stochastic process {Yt} is ergodic if any two collections of random variables partitioned far apart in the sequence are essentially independent.
The formal definition of ergodicity is highly technical and requires advanced concepts in probability theory. However, the intuitive definition captures the essence of the concept. The stochastic process {Yt} is ergodic if Yt and Yt−j are essentially independent if j is large enough.
If a stochastic process {Yt} is covariance stationary and ergodic then strong restrictions are placed on the joint behavior of the elements in the sequence and on the type of temporal dependence allowed.
If {Yt} is GWN or IWN then it is both covariance stationary and ergodic.
◼
Let Yt∼GWN(0,1) and let X∼N(0,1) independent of {Yt}. Define Zt=Yt+X. Then {Zt} is covariance stationary but not ergodic. To see why {Zt} is not ergodic, note that for all j>0: var(Zt)=var(Yt+X)=1+1=2,γj=cov(Yt+X,Yt−j+X)=cov(Yt,Yt−j)+cov(Yt,X)+cov(Yt−j,X)+cov(X,X)=cov(X,X)=var(X)=1,ρj=12 for all j. Hence, the correlation between random variables separated far apart does not eventually go to zero and so {Zt} cannot be ergodic.
◼
To conserve on notation, we will represent the stochastic process {Yt}∞t=−∞ simply as {Yt}.↩︎
This is also called a Cauchy distribution. For this distribution E[Yt]=var(Yt)=cov(Yt,Yt−j)=∞.↩︎
As an example of white noise, think of tuning an AM radio. In between stations there is no signal and all you hear is static. This is white noise.↩︎