Chapter 1 Time Series Fundamentals

1.1 Introduction

Definition (non-mathematical)

A time series is a single set of data whose observations are ordered in time. The important difference with time series data is that the observations relate to a single quantity measured at a number of points in time. Therefore observations that are close in time are likely to be correlated and not independent. As a result, the majority of statistical models you have met are not appropriate for modelling time series data, because they assume the observations are independent.

Definition (mathematical)

A time series process is a stochastic process \(\{X_{t}~| ~t\in T\}\), which is a collection of random variables that are ordered in time. Here \(T\) is called the index set, and determines the set of times at which the process is defined and observations are made. In this course we restrict attention to

  • random variables \(X_{t}\) that are continuous, i.e. their set of possible outcomes is a continuous range.

  • index sets \(T\) that are discrete and equally spaced in time, so that observations are collected hourly, daily, monthly, yearly, etc.

In this course we adopt the following notation:

  • Random variables are denoted by capital letters, \(X_{t}\), and are random quantities that have a distribution. Random variables can be defined for infinitely many time points \(t\in T\).

  • Observations are denoted by lower case letters, \(x_{t}\), and are realisations of the random variables (numbers). Such realisations are only available at a finite number of time points (i.e. since records began), meaning that only \(n\) observations \(\{x_{1},\ldots,x_{n}\}\) are available.

1.2 Examples

Time series data are found in a wide variety of application areas, examples of which include:

  • Environmental: Yearly average temperature levels, daily CO\(_2\) levels in the atmosphere.

  • Economic: Daily value of the FTSE share index, the UK’s yearly gross domestic product (GDP), monthly levels of unemployment.

  • Medical: Daily number of deaths in Glasgow due to heart attack, size of the monthly transplant waiting list.

  • Educational: Number of students obtaining degrees from the University of Glasgow per year, weekly attendance at lectures.

  • Business: Monthly sales figures for a leading supermarket, number of chocolate bars made per week by Cadburys.

  • Leisure: Number of goals scored in the Premier League each week of the season, number of people going to the cinema per week.

Definition

The most important descriptive tool with which to analyse a time series is the time plot, which is a plot of the data on the vertical axis against time, on the horizontal axis. The time plot gives you a visual description of the time series which allows you to pick out any prominent features.

We show some examples together with the R code to produce them.

Example - Air pollution in Glasgow

Figure 1.1 shows the daily average air pollution concentrations in Glasgow Anderston for the last three months of 2007. The pollutant measured is called particulate matter, which comprises small particles of liquid and solids that are suspended in the air.

Average daily air pollution concentrations in Glasgow Anderston in 2007.

Figure 1.1: Average daily air pollution concentrations in Glasgow Anderston in 2007.

These data are available on the Moodle course website, and the above plot can be produced using the following R code (you will need to ensure the file directory is correct).

Example - Air traffic into Great Britain

Figure 1.2 displays the number of foreign passengers entering Great Britain by air per quarter from 2000 to 2007.

Number of air travellers into Britain per quarter.

Figure 1.2: Number of air travellers into Britain per quarter.

There are a large number of models designed specifically for time series data, and we describe a number throughout this course. For now here are two of the simplest.

Definition

A purely random process (also known as a white noise process) is a time series process \(\{X_{t}~|~t\in T\}\) defined by

\[\begin{eqnarray} \mathbb{E}[X_{t}]&=&\mu\nonumber\\ \mathrm{Var}[X_{t}]&=&\sigma^{2}\nonumber \end{eqnarray}\]

where each \(X_{t}\) is independent.

Note all we have specified about \(X_{t}\) is its mean, variance and correlation structure, we have said nothing about its distributional form. Each \(X_{t}\) could be Gaussian, exponential, gamma, etc. Also note this is the most uninteresting (unrealistic) time series, as it assumes the observations are all independent.

Definition

A random walk process is a time series process \(\{X_{t}~|~t\in T\}\) defined by

\[X_{t}=X_{t-1}+Z_{t}\]

where \(Z_{t}\) is a purely random process with mean \(\mu\) and variance \(\sigma^{2}\). The process is started at \(X_{0}=0\), so that \(X_{1}=Z_{1}\), \(X_{2}=X_{1}+Z_{1}\), etc.

This is a very appealing model for time series correlation, because the value of the current observation depends on the previous observation and random error. We will return to this model later in the course. Note again that we have not made any assumptions about the distributional form of the purely random process.

An interactive example displaying both the purely random process and random walk process can be found at the following link.

Examples

The R code below generates realisations from a purely random process and a random walk process, where the distribution of the series is Gaussian (e.g. \(N(0, 1)\)) or exponential (with mean 1).

Simulated time series from Gaussian and exponential distributions from white noise and random walk processes.

Figure 1.3: Simulated time series from Gaussian and exponential distributions from white noise and random walk processes.

Which of these do you think are sensible models for time series data and why? We shall see later.

1.3 Objectives of a time series analysis

Given a set of time series data, you the analyst will generally be asked to answer one or more questions of interest about it. The main types of questions that arise for time series data depend on the context of the data and why it was collected. Some of the main reasons for collecting and analysing time series data are described below.

  1. Description: Describe the main features of the time series such as: is the series increasing or decreasing; are there any seasonal patterns (e.g. higher in summer and lower in winter); and how does a second explanatory variable effect the value of the time series?

  2. Monitoring: Detect when changes in the behaviour of the time series have occurred, e.g. sudden drops in sales.

  3. Forecasting: Predict future values of the time series from the current values, and quantify the uncertainty in these predictions.

Example

Figure 1.4 shows the daily number of hospital admissions due to respiratory disease in Glasgow between 2000 and 2007.

Daily number of hospital admissions due to respiratory disease in Glasgow between 2000 and 2007.

Figure 1.4: Daily number of hospital admissions due to respiratory disease in Glasgow between 2000 and 2007.

Possible questions of interest here are

  1. What are the seasonal patterns in admissions, so we know (roughly) how many beds will be required next year?

  2. How are the numbers of admissions affected by external factors such as air pollution concentrations?

Example

Figure 1.5 shows the share price of a well known retailer between 2001 and 2009.

Share price of a retailer between 2001 and 2009. The retailer went into administration in November 2008.

Figure 1.5: Share price of a retailer between 2001 and 2009. The retailer went into administration in November 2008.

A possible question of interest here is

  1. When did the share price start to drop and what caused this drop?

Example

Figure 1.6 shows the average global temperature over the last 150 years.

Annual and five-year average temperature on Earth for the past 150 years.

Figure 1.6: Annual and five-year average temperature on Earth for the past 150 years.

A possible question of interest here is

  1. What will the temperature be over the next 10 to 50 years?

Example

Figure 1.7 shows the numbers of cardiovascular events in Munich.

Number of cardiovascular events in Munich in 2003, 2005 and 2006.

Figure 1.7: Number of cardiovascular events in Munich in 2003, 2005 and 2006.

A possible question of interest here is

  1. Why do the peaks occur? Hint: A big sporting event was held in Germany.

Example

Figure 1.8 shows the weekly GP consultation rates for flu.

Weekly GP consultation rates for influenza-like illness.

Figure 1.8: Weekly GP consultation rates for influenza-like illness.

A possible question of interest here is

  1. Why do the peaks occur?

1.4 Time series modelling

Time series data are often decomposed into the following three components.

  • Trend - A trend is a long-term change in the mean of the process over time. If a trend exists its shape will often be of interest, although it may not be linear. The particulate matter data do not have a trend, where as the air traffic data have a linearly increasing trend.

  • Seasonal effect - A seasonal effect is a trend in the time series that repeats itself at regular intervals. Strictly speaking a seasonal effect is only one that repeats itself every year, but in this course we use the term more broadly to mean any regularly repeating pattern. The particulate matter data do not have a seasonal effect, where as the air traffic data do.

  • Unexplained variation - Unexplained variation is the remaining variation in a time series once any trend and seasonal variation have been removed. This unexplained variation may be independent or exhibit short-term correlation, and the latter is the case of most interest in this course.

Therefore two simple schematic models for time series data are given by

  • Additive - \(X_{t}=m_{t}+s_{t}+e_{t}\)

  • Multiplicative - \(X_{t}=m_{t}s_{t}e_{t}\)

where

  • \(m_{t}\) represents the trend;

  • \(s_{t}\) the seasonal variation; and

  • \(e_{t}\) the unexplained variation.

Thus the series is partitioned into three components, trend, seasonal variation and unexplained error, and separate models can be specified for each component.

An additive model is appropriate when the trend and seasonal variation act independently, while a multiplicative model is required if the size of the seasonal effect depends on the size of the trend. These differences are displayed in Figure 1.9.

Examples of data for additive and multiplicative models.

Figure 1.9: Examples of data for additive and multiplicative models.

There are a number of reasons why representing a time series as an additive decomposition of trend, seasonal variation and error is preferable to a multiplicative one.

  1. The independent effects of trend and seasonality are typically of interest, so that the average effect of being in a particular season can be assessed.

  2. Multiplicative seasonal effects and trends are harder to estimate than additive ones.

  3. Data with a constant level of variation are easier to model than that with a non-constant variance.

So if we have time series data that has a multiplicative structure, how do we model it?

1.4.1 Transformations

Data that appear to have a multiplicative structure can be transformed into an additive structure by modelling the data on the natural log scale. Indeed, if you take natural logarithms of the multiplicative model on both sides you end up with an additive model on the log scale, that is

\[\log(X_{t})=\log(m_{t}s_{t}e_{t})=\log(m_{t}) + \log(s_{t}) + \log(e_{t}).\] Natural log is just one of a number of possible transformations you can make to time series data. Transformations can be used to:

  1. Stabilise the variance - If the variation in the time series increases with the trend, then a transformation may make the variance constant.

  2. Make the seasonal effects additive - Multiplicative trends and seasonal variation can be changed to additive effects by transformation.

  3. Make the data normally distributed - A number of time series models assume the data are normally distributed, so a transformation may improve normality.

Two of the most common transformations in time series are natural log and square root, but a more general class of transformations is called the Box-Cox transformation, named after two very famous statisticians, George Box and Sir David Cox.

Definition

Given an observed time series \(\{x_{t}\}\), the Box-Cox transformation is given by

\[y_{t}=\left\{\begin{array}{cc}(x_{t}^{\lambda}-1)/\lambda& \lambda\neq0\\ \ln(x_{t})& \lambda=0\\\end{array}\right.\]

where the transformation parameter \(\lambda\) is chosen by the time series analyst. Here \(\lambda=0\) corresponds to natural log while \(\lambda=0.5\) corresponds to square root.

Example

The top panel of the graph below shows time series data where the variance increases with the trend. This non-constant variance can be removed by taking natural logarithms as shown in the bottom panel.

Time series data and its log-transform stabilising the variance.

Figure 1.10: Time series data and its log-transform stabilising the variance.

1.5 Time series properties

Given a time series process \(\{X_{t}~| ~t\in T\}\) and corresponding observations \(x_{1},\ldots,x_{n}\), the following properties largely define its characteristics.

1.5.1 Mean function

The mean function of a time series process is defined for all \(t\in T\) as

\[\mu_{t}=\mathbb{E}[X_{t}],\]

that is the average value of the process. For real data if we assume the mean is constant, i.e. \(\mu_{t}=\mu\), then the obvious estimate is

\[\hat{\mu}=\frac{1}{n}\sum_{t=1}^{n}x_{t}.\]

If the mean of the data is not constant, for example due to the presence of trends or seasonal variation, then they can be estimated using the methods described in Chapter 2.

1.5.2 Variance function

The variance function of a time series process is defined for all \(t\in T\) as

\[\sigma^{2}_{t}=\mathrm{Var}[X_{t}]=\mathbb{E}[X_{t}^{2}]-\mathbb{E}[X_{t}]^{2}\]

while the standard deviation function is given by \(\sigma_{t}=\sqrt{\sigma^{2}_{t}}\). For real data if we assume the variance is constant, i.e. \(\sigma^{2}_{t}=\sigma^{2}\), then the obvious estimate is

\[\hat{\sigma}^{2}=\frac{1}{n-1}\sum_{t=1}^{n}(x_{t}-\hat{\mu})^{2}.\]

There are time series models that allow for non-constant variance.

1.5.3 Autocovariance and autocorrelation functions

Recall that for any random variables \(X\) and \(Y\), the covariance and correlation measure the level of dependence between the variables. They are given by

  • Covariance - \(\mathrm{Cov}[X,Y]=\mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] = \mathbb{E}[XY]-\mathbb{E}[X]\mathbb{E}[Y]\),

  • Correlation - \(\mathrm{Corr}[X,Y] = \mathrm{Cov}[X,Y] / \sqrt{\mathrm{Var}[X]\mathrm{Var}[Y]}\).

The correlation is a scaled version of the covariance that lies between -1 and 1, where 1 represents strong positive correlation, 0 represents independence and -1 represents strong negative correlation.

For a time series process the random variables \((X_{t}, X_{s})\) relate to the same quantity measured at different points in time. Therefore the dependence between them is described by the autocovariance and autocorrelation functions, with the ‘auto’ prefix being added to denote the fact that both random variables measure the same quantity (albeit at different time points).

Definition

The autocovariance function (ACVF) is defined for all \(s,t\in T\) as

\[\gamma_{s,t}=\mathrm{Cov}[X_{s}, X_{t}]=\mathbb{E}[X_{s}X_{t}]-\mathbb{E}[X_{t}]\mathbb{E}[X_{s}],\]

where \(\gamma_{t,t}=\mathrm{Cov}[X_{t},X_{t}]=\mathrm{Var}[X_{t}]=\sigma^{2}_{t}\).

The autocorrelation function (ACF) is given by

\[\rho_{s,t}=\mathrm{Corr}[X_{s}, X_{t}]=\frac{\mathrm{Cov}[X_{s},X_{t}]}{\sqrt{\mathrm{Var}[X_{s}]\mathrm{Var}[X_{t}]}}=\frac{\gamma_{s,t}}{\sigma_{s}\sigma_{t}},\]

where \(\rho_{t,t}=\mathrm{Corr}[X_{t},X_{t}]=1\).

To calculate the autocovariance and autocorrelation functions for real data we assume that the dependence structure in the data does not change over time. That is we assume that

\[\gamma_{s,t}=\mathrm{Cov}[X_{s}, X_{t}]=\mathrm{Cov}[X_{s+r}, X_{t+r}]=\gamma_{s+r,t+r}\]

for any time points \((s,t)\) and increment vector \(r\). Under this assumption, the only factor that affects the covariance is the distance \(\tau=|s-t|\) between the observations, which is called the lag. Therefore the only autocovariances that need to be calculated are the set

\[\gamma_{\tau}=\mathrm{Cov}[X_{t}, X_{t+\tau}]\hspace{0.5cm}\tau=0,1,2,\ldots\]

Notes

  • The covariances do not depend on the starting location \(t\), and only on the lag \(\tau\).

  • When \(\tau=0\) we have \(\gamma_{0}=\mathrm{Cov}[X_{t}, X_{t}]=\mathrm{Var}[X_{t}]=\sigma^{2}\) such that the variance is constant over time and does not depend on \(t\).

  • Under this simplification the autocorrelation function becomes

\[\rho_{\tau}=\mathrm{Corr}[X_{t},X_{t+\tau}]=\frac{\mathrm{Cov}[X_{t},X_{t+\tau}]}{\sqrt{\mathrm{Var}[X_{t}]\mathrm{Var}[X_{t+\tau}]}}=\frac{\gamma_{\tau}}{\gamma_{0}}\]

so that the autocorrelation function also does not depend on the original time point \(t\).

Properties of the autocorrelation function

Consider a weakly stationary time series process \(\{X_{t}~|~t\in T\}\) with constant mean \(\mu\), variance \(\sigma^{2}\) and autocovariance / autocorrelation functions \((\gamma_{\tau},~\rho_{\tau})\). Then the autocorrelation function has the following three properties.

Property 1

The autocorrelation function is an even function so that \(\rho_{\tau}=\rho_{-\tau}\).

Proof

This is clearly true because the process is second order stationary. Therefore

\[\begin{eqnarray} \rho_{\tau}&=&\gamma_{\tau}/\sigma^{2}\nonumber\\ &=&\mathrm{Cov}[X_{t}, X_{t+\tau}]/\sigma^{2}\nonumber\\ &=&\mathrm{Cov}[X_{t-\tau}, X_{t}]/\sigma^{2}\nonumber\\ &=&\gamma_{-\tau}/\sigma^{2}\nonumber\\ &=&\rho_{-\tau}\nonumber \end{eqnarray}\]

Property 2

The autocorrelation function satisfies \(|\rho_{\tau}|\leq1\).

Proof

Note that for any constants \(\lambda_{1},~\lambda_{2}\) we have that

\[\mathrm{Var}[\lambda_{1}X_{t}+\lambda_{2}X_{t+\tau}]=(\lambda_{1}^{2}+\lambda_{2}^{2})\sigma^{2}+2\lambda_{1}\lambda_{2}\gamma_{\tau}\geq0\]

due to the fact that the variance is always non-negative. Therefore setting \(\lambda_{1}=\lambda_{2}=1\) we find that \(\gamma_{\tau} \geq -\sigma^{2}\) and hence \(\rho_{\tau} \geq -1\). Conversely if \(\lambda_{1}=1\) and \(\lambda_{2}=-1\), then we have that \(\sigma^{2} \geq \gamma_{\tau}\) and hence \(\rho_{\tau}\leq 1\). Thus we have \(|\rho_{\tau}|\leq1\) as required.

Property 3

Although a given stochastic process has a unique autocorrelation function, the converse is not true. That is, more than one stochastic process can have the same autocorrelation function. For there to be a one to one relationship between a stochastic process and its autocorrelation function we require a property called invertibility, which we discuss later in the course.

Estimating the autocorrelation function

Recall that for any two sets of data \((y_{1},\ldots,y_{n})\) and \((z_{1},\ldots,z_{n})\), the sample covariance and correlation functions are given by

\[\begin{eqnarray} \hat{\gamma}_{y,z}&=&\frac{1}{n-1}\sum_{t=1}^{n}(y_{t}-\bar{y})(z_{t}-\bar{z})\nonumber\\ \hat{\rho}_{y,z}&=&\frac{\sum_{t=1}^{n}(y_{t}-\bar{y})(z_{t}-\bar{z})}{\sqrt{\sum_{t=1}^{n}(y_{t}-\bar{y})^{2}\sum_{t=1}^{n}(z_{t}-\bar{z})^{2}}}\nonumber \end{eqnarray}\]

However, for time series data the autocovariance and autocorrelation functions measure the covariance / correlation between the single time series \((x_{1},\ldots,x_{n})\) and itself at different lags.

Lag 0

The sample autocovariance function at lag 0, denoted by \(\hat{\gamma}_{0}\), is the covariance of \((x_{1},\ldots,x_{n})\) with \((x_{1},\ldots,x_{n})\). Applying the formula above gives

\[\hat{\gamma}_{0}=\frac{1}{n-1}\sum_{t=1}^{n}(x_{t}-\bar{x})(x_{t}-\bar{x})=\frac{1}{n-1}\sum_{t=1}^{n}(x_{t}-\bar{x})^{2}=\hat{\sigma}^{2}.\]

Therefore the sample autocovariance function at lag 0 is the sample variance. Similarly, the autocorrelation function measures the correlation between the series and itself and is given by

\[\hat{\rho}_{0}=\frac{\sum_{t=1}^{n}(x_{t}-\bar{x})(x_{t}-\bar{x})}{\sqrt{\sum_{t=1}^{n}(x_{t}-\bar{x})^{2}\sum_{t=1}^{n}(x_{t}-\bar{x})^{2}}} =1.\]

Lag 1

The sample autocovariance function at lag 1 is the covariance of \((x_{1},\ldots,x_{n-1})\) with \((x_{2},\ldots,x_{n})\). It is the covariance of the series with itself shifted by one time point. Applying the covariance and correlation formulae gives

\[\hat{\gamma}_{1}=\frac{1}{n-2}\sum_{t=1}^{n-1}(x_{t}-\bar{x}_{1})(x_{t+1}-\bar{x}_{2})\]

\[\hat{\rho}_{1}=\frac{\sum_{t=1}^{n-1}(x_{t}-\bar{x}_{1})(x_{t+1}-\bar{x}_{2})} {\sqrt{\sum_{t=1}^{n-1}(x_{t}-\bar{x}_{1})^{2}\sum_{t=1}^{n-1}(x_{t+1}-\bar{x}_{2})^{2}}}\]

where

  • \(\bar{x}_{1}=\sum_{t=1}^{n-1}x_{t}/(n-1)\) - the mean of the first \(n-1\) observations,

  • \(\bar{x}_{2}=\sum_{t=2}^{n}x_{t}/(n-1)\) - the mean of the last \(n-1\) observations.

In practice, including in R, the following alternative expressions are used for simplicity.

\[\hat{\gamma}_{1}=\frac{1}{n}\sum_{t=1}^{n-1}(x_{t}-\bar{x})(x_{t+1}-\bar{x})\]

\[\hat{\rho}_{1}=\frac{\sum_{t=1}^{n-1}(x_{t}-\bar{x})(x_{t+1}-\bar{x})}{\sum_{t=1}^{n}(x_{t}-\bar{x})^{2}} = \frac{\hat{\gamma}_{1}}{\hat{\gamma}_{0}}\]

The above expressions have been simplified by assuming that the mean and variance of the first \(n-1\) observations equal those for the last \(n-1\) observations. In addition, for the covariance formula the divisor \(n\) is used rather than the unbiased \(n-2\). Clearly, when \(n\) is large changing the divisor makes little practical difference to the calculations.

These definitions extend naturally to other lags as follows.

Definition

The sample autocovariance function (ACVF) for a time series \((x_{1},\ldots,x_{n})\) is given by

\[\hat{\gamma}_{\tau}=\frac{1}{n}\sum_{t=1}^{n-\tau}(x_{t}-\bar{x})(x_{t+\tau}-\bar{x})\hspace{1cm}\tau=0,1,\ldots\]

The sample autocorrelation function (ACF) is therefore given by

\[\hat{\rho}_{\tau}=\frac{\sum_{t=1}^{n-\tau}(x_{t}-\bar{x})(x_{t+\tau}-\bar{x})}{\sum_{t=1}^{n}(x_{t}-\bar{x})^{2}} = \frac{\hat{\gamma}_{\tau}}{\hat{\gamma}_{0}}\hspace{1cm}\tau=0,1,\ldots\]

The sample autocorrelation function can be calculated by hand or automatically in R using the function

where x is the time series and lag.max is the maximum lag you wish to calculate the autocorrelation function for.

An interactive example to help understand the autocovariance and autocorrelation functions can be found at the following link.

Definition

The second most important time series plot is the correlogram, which is a plot of the autocorrelation function on the vertical axis against lag \(\tau\) on the horizontal axis.

Some examples of the correlogram and how to interpret it are given below.

1.5.4 Interpreting the correlogram

The correlogram will tell a time series analyst a lot about a time series, including the presence of trends, seasonal variation and short-term correlation.

Example - purely random data

Consider a realisation of a time series generated from a purely random process, \(X_{t}\sim\mbox{N}(0, 1)\), which has no trend, seasonality or short-term correlation. An example of such data is displayed in Figure 1.11.

Time series simulated from a purely random process and its corresponding correlogram.

Figure 1.11: Time series simulated from a purely random process and its corresponding correlogram.

The data and graphs can be generated using the following R code.

Notes

  1. The correlogram will always equal one when the lag is zero, because it is the correlation of the series with itself. This value can therefore be ignored.

  2. The dashed (blue) lines are approximate \(95\%\) confidence intervals for the autocorrelation function assuming that the true correlation is zero. They are equal to

\[\pm1.96/\sqrt{n}.\]

Therefore lags where the correlogram is inside the blue lines do not have correlation significantly different from zero. Note that these are only 95\(\%\) confidence intervals, so on average you would expect one out of twenty to be outside the blue lines by chance.

  1. For a purely random series that has no correlation, you would expect the correlogram to equal one at lag zero, but show no further evidence of correlation at other lags.

Example - short-term correlation

Time series data with no trend or seasonality but short-term correlation is shown in Figure 1.12 and has positive significant autocorrelations at the first few lags, followed by values close to zero at larger lags.

Time series with short-term correlation and its corresponding correlogram.

Figure 1.12: Time series with short-term correlation and its corresponding correlogram.

Example - alternating data

Time series data that has no trend or seasonality but alternates between large and small values is displayed in Figure 1.13 and has negative autocorrelations at odd lags and positive autocorrelations at even lags. As the lag increases the autocorrelations get closer to zero.

Time series which alternates between large and small values and its corresponding correlogram.

Figure 1.13: Time series which alternates between large and small values and its corresponding correlogram.

Example - data with a trend

Time series data that has a trend is shown in Figure 1.14 and has positive autocorrelations at a large number of lags. Note, the same correlogram would be observed if the trend was decreasing over time.

Time series data with a trend and its corresponding correlogram.

Figure 1.14: Time series data with a trend and its corresponding correlogram.

Example - data with a seasonal effect

Time series data that has a seasonal effect is shown in Figure 1.15 and has a regular seasonal pattern in the correlogram.

Time series data with a seasonal effect and its corresponding correlogram.

Figure 1.15: Time series data with a seasonal effect and its corresponding correlogram.

Example - data with a trend and a seasonal effect

Time series data that has a trend and a seasonal effect is displayed in Figure 1.16 and has a regular seasonal pattern in the correlogram, although due to the trend, the correlogram will generally have positive values.

Time series data with a trend and seasonal effect and its corresponding correlogram.

Figure 1.16: Time series data with a trend and seasonal effect and its corresponding correlogram.

Note

If the correlogram exhibits a trend and seasonal variation, then the presence or absence of short-term correlation is hidden. Therefore to assess the presence of short-term correlation in a time series, the trend and seasonal variation must first be removed, using the methods described in Chapter 2.

1.5.5 Testing for independence

The presence or absence of temporal correlation can be assessed by looking at the correlogram. However, some statisticians like conducting a formal statistical test for the presence of correlation. Personally, I don’t feel you need to, as looking at the correlogram will give you all the information you need. However, if you want to then the most popular of such tests are called Portmanteau tests.

Portmanteau tests conduct a single significance test on the set of autocorrelations \(\hat{\rho}_{1},\hat{\rho}_{2},\ldots,\hat{\rho}_{h}\), where the maximum lag \(h\) is chosen by you. Given a time series \((x_{1},\ldots,x_{n})\), the hypotheses for this test are:

  • \(H_{0}\) - The time series is independent.
  • \(H_{1}\) - The time series is not independent.

Two competing test statistics have been developed for this test, namely

  • Box–Pierce statistic - \(Q_{BP} = n \sum_{\tau=1}^h \hat{\rho}^2_{\tau}\)
  • Ljung–Box statistic - \(Q_{LB} = n(n+2) \sum_{\tau=1}^h \hat{\rho}^2_{\tau}/(n-\tau)\)

Under H\(_{0}\) both these statistics have a \(\chi^{2}_{h}\) distribution, so we reject \(H_{0}\) at the 5\(\%\) level if the test statistic is greater than a \(\chi^{2}_{h,0.95}\).

The Ljung–Box statistic is typically preferred because it rejects / fails to reject the null hypothesis correctly more often than the Box–Pierce statistic. It also gives autocorrelations at low lags more importance (although only just when \(n\) is large) than those at higher lags.

1.5.6 Example

Recall again the respiratory admissions data. Figure 1.17 is a plot of the de-trended data (see next section for how to do this) and the correlogram, as well as the results of the Ljung-Box and Box-Pierce tests. All of these suggest the data exhibit short-term correlation. The correlogram also shows evidence of a weekly seasonal effect (period of 7 days) that is not obvious from the time plot.

Detrended time series data and its correlogram.

Figure 1.17: Detrended time series data and its correlogram.

Notes

  • There are many other tests for independence.
  • All these tests are not very powerful, so it is advisable to implement several different tests before rejecting or accepting \(H_0\).
  • The best strategy (in my opinion) is to simply plot the correlogram and decide for yourself whether there is short-term correlation.

1.6 Stationarity

Definition

A time series process \(\{X_{t}~|~t\in T\}\) is strictly stationary (or strongly stationary) if the joint distribution \(f(X_{t_{1}},\ldots, X_{t_{k}})\) is identical to the joint distribution \(f(X_{t_{1}+r},\ldots, X_{t_{k}+r})\) for all collections \(t_{1},\ldots,t_{k}\) and separation values \(r\). In other words, shifting the time origin of the series by \(r\) has no effect on its joint distribution.

Notes

  1. When \(k=1\), strict stationarity implies that \(f(X_{t})=f(X_{t+r})\) for all \(r\), so that the marginal distributions are the same for all time points. This in turn implies that the mean and variance functions are constant, i.e. \(\mu_{t}=\mathbb{E}[X_{t}]=\mu\) and \(\sigma^{2}_{t}=\mathrm{Var}[X_{t}]=\sigma^{2}\).

  2. Again when \(k=1\) the distribution of \(f(X_{t})\) is proper, so that both the mean and variance are finite, i.e. \(\mu, \sigma^{2}<\infty\).

  3. When \(k=2\) strict stationarity implies that \(f(X_{t_{1}},X_{t_{2}})=f(X_{t_{1}+r},X_{t_{2}+r})\), so that the joint distribution only depends on the lag \(\tau=|t_{2}-t_{1}|\). This in turn implies that the theoretical covariance and correlation functions only depend on the lag and not the original location, so that

\[\begin{eqnarray} \gamma_{t,t+\tau}&=&\mathrm{Cov}[X_{t},X_{t+\tau}]=\gamma_{\tau}\nonumber\\ \rho_{t,t+\tau}&=&\mathrm{Corr}[X_{t},X_{t+\tau}]=\rho_{\tau}\nonumber \end{eqnarray}\]

  1. Strict stationarity is very restrictive and few processes achieve it. In this course only the purely random process is strictly stationary.

Definition

A time series process \(\{X_{t}~|~t\in T\}\) is weakly stationary (or second-order stationary) if

  1. The mean function is constant and finite - \(\mu_{t}=\mathbb{E}[X_{t}]=\mu<\infty\).

  2. The variance function is constant and finite - \(\sigma^{2}_{t}=\mathrm{Var}[X_{t}]=\sigma^{2}<\infty\).

  3. The autocovariance and autocorrelation functions only depend on the lag -

\[\begin{eqnarray} \gamma_{t,t+\tau}&=&\mathrm{Cov}[X_{t},X_{t+\tau}]=\gamma_{\tau}\nonumber\\ \rho_{t,t+\tau}&=&\mathrm{Corr}[X_{t},X_{t+\tau}]=\rho_{\tau}\nonumber \end{eqnarray}\]

Notes

  1. If a time series process is strictly stationary then it is weakly stationary, although the converse is not true unless the process is normally distributed. This is because a normal distribution is completely defined by its first 2 moments.

  2. The difference between strict and weak stationarity is that the latter only assumes the first two moments are constant over time, where as the former assumes the higher moments are also constant.

  3. A number of common time series models are weakly stationary, so can only be applied to data that appear to be stationary (i.e. contain no trend or seasonal variation).

Example

The purely random process \(\{X_{t}~|~t\in T\}\) which comprises independent random variables with \(\mathbb{E}[X_{t}]=\mu\), \(\mathrm{Var}[X_{t}]=\sigma^{2}\) is second order (and strictly) stationary. This is because the mean and variance are constant and finite, while independence implies that the autocovariance and autocorrelation functions are given by

\[\gamma_{\tau}=\left\{\begin{array}{cc}\sigma^{2}&\tau=0\\ 0&\tau\neq 0\\\end{array}\right.\hspace{2cm} \rho_{\tau}=\left\{\begin{array}{cc}1&\tau=0\\ 0&\tau\neq 0\\\end{array}\right.\]

Example

A random walk process \(\{X_{t}~|~t\in T\}\) defined by \(X_{0}=0\) and

\[X_{t}=X_{t-1}+Z_{t}\]

where \(Z_{t}\) is a purely random process with mean 0 and variance \(\sigma^{2}\) is not stationary. This is because its variance is given by

\[\begin{eqnarray} \mathrm{Var}[X_{t}]&=&\mathrm{Var}[X_{t-1}+Z_{t}]\nonumber\\ &=&\mathrm{Var}[X_{t-1}]+\sigma^{2}\nonumber\\ &=&t\sigma^{2}\nonumber \end{eqnarray}\]

which depends on \(t\) and is not constant over time. This is highlighted in the following link.

1.7 Modelling strategy

A time series analysis typically has three stages.

  1. Model formulation - Using numerical and graphical summaries, determine an appropriate model for the data that addresses the relevant question of interest.

  2. Model fitting - Estimate the parameters of the chosen model, which is most easily done using computer software.

  3. Model checking - Determine how well your chosen model fits the data by looking at numerical and graphical summaries of the residuals. If your model appears to be adequate then stop and answer the questions of interest, otherwise return to stage 1 and reformulate your model.

There are two general approaches to modelling time series data that contain a trend and/or seasonal variation.

  1. First model the trend and seasonality in the data, and then use a stationary time series model to represent the short-term correlation.

  2. Model the trend, seasonality and short-term correlation in the data simultaneously using a non-stationary time series model.

In the remainder of this course we describe how to implement both approaches, starting with how to remove trend and seasonal variation from time series data.