Chapter 10 GARCH models

Typical models of stochastic processes (ARMA/ARIMA) rely on the assumption of homoscedasticity (conditional variance remaining constant over time). However, this assumption turns out to be unrealistic for some (if not most) financial time series: increased recent volatility typically signals that elevated volatility is likely to persist for some time, contradicting the constant variance assumption.

GARCH models are designed to enable modeling of volatility clusters. The acronym ARCH stands for Auto-Regressive Conditional Heteroskedasticity, while GARCH is its generalization (Generalized ARCH).

10.1 Constant and changing variance

Let us assume a model with constant conditional variance \(\mathbb{V}(Y_t|X_{1,t}, ..., X_{p,t}) = \sigma^2\). In this case, the general regression model with the dependent variable \(Y_t\) and explanatory variables \(X_{1,t}, ..., X_{p,t}\) takes the following form:

\[Y_t = f(X_{1,t}, ..., X_{p,t}) + \varepsilon_t, \tag{10.1} \] where \(\varepsilon_t\) represents an independent disturbance term with a mean of \(0\) and variance \(\sigma^2_\varepsilon\) uncorrelated with \(X_{1,t}, ..., X_{p,t}\). The function \(f(\cdot)\) describes the conditional expectation \(\mathbb{E}(Y_t|X_{1,t}, ..., X_{p,t})\), and the conditional variance of \(Y_t\) is \(\sigma^2_\varepsilon\).

Equation (10.1) can be generalized to allow for varying variance as follows:

\[Y_t = f(X_{1,t}, ..., X_{p,t}) + \varepsilon_t\sigma(X_{1,t}, ..., X_{p,t}), \tag{10.2} \] where the standardized disturbance \(\varepsilon_t\) follows a (conditional) normal distribution with mean \(0\) and standard deviation \(1\). The function \(\sigma(X_{1,t}, ..., X_{p,t})\) allows for modeling conditional variance and must return positive values, as it represents the conditional standard deviation. The value \(\sigma^2(X_{1,t}, ..., X_{p,t})\) is the conditional variance.

Models that enable modeling conditional variance are referred to as variance function models. Among these are models from the GARCH family.

10.2 Model ARCH(1)

Let us assume that \(\varepsilon_t\) is a strict (Gaussian) white noise process with unit variance.

In this case, the expected value is:

\[\mathbb{E}(\varepsilon_t|\varepsilon_{t-1}, ...) = 0, \tag{10.3} \]

and the conditional variance is:

\[\mathbb{V}(\varepsilon_t|\varepsilon_{t-1}, ...) = 1 \tag{10.4}\]

An ARCH(1) model can be formulated using the following equation:

\[a_t = \varepsilon_t\sqrt{\omega + \alpha a^2_{t-1}}, \tag{10.5} \] where \(\omega > 0\) and \(0 \leqslant \alpha < 1\).

Equation (10.5) can be considered a special case of model (10.2) where \(f(\cdot) = 0\) i \(\sigma(\cdot) = \sqrt{\omega + \alpha a^2_{t-1}}\).

It can also be written as:

\[a_t^2 = \varepsilon_t^2\left(\omega + \alpha a^2_{t-1}\right), \tag{10.6} \]

Let \(\sigma^2_t\) denote the conditional variance of the process described by \(a_t\):

\[\sigma_t^2 = \mathbb{V}(a_t|a_{t-1}, ...)\]

Then:

\[\sigma_t^2 = \omega + \alpha a^2_{t-1} \tag{10.7} \]

Equation (10.7) explains how GARCH models work. When \(a_{t-1}\) has a large absolute value, the standard deviation \(\sigma_t\) will also be larger than usual. This increases the likelihood of observing a large \(a_t\). Conversely, small \(a_{t-1}\) leads to a smaller and reduced variability in \(a_t\).

The marginal (unconditional) variance of the ARCH(1) process is given by:

\[\gamma_a(0) = \frac{\omega}{1-\alpha}\]

Autocorrelations in the ARCH(1) process are equal to zero.

Simulation example:

# individual series
set.seed(124)
T  <- 200
alpha <- .8
omega <- .4
at <- numeric(T)
at[1] <- omega/(1-alpha)
for (t in 2:T) {
  at[t] <- sqrt(omega + alpha*(at[t-1])^2) * rnorm(1, 0, 1) 
}
plot(at, type="l")

# autokorelacje w próbce
op<-par(mfrow=c(1,2))
acf(at)
acf(at^2)

par(op)


# multiple series and boxplots
nsim <- 1e3
w <- replicate(nsim,{ 
  at <- numeric(T)
  at[1] <- sqrt(omega/(1-alpha))
  for (t in 2:T) {
    at[t] <- sqrt(omega + alpha*(at[t-1])^2) * rnorm(1, 0, 1) 
  }
  at}
)
boxplot(t(w), pch='.', ylim=c(-5,5))

10.3 AR(1) + ARCH(1) model

The AR(1) model has a constant conditional variance but a variable conditional mean. The ARCH(1) process, on the other hand, has the opposite characteristics. Both models can be combined as follows:

\[y_t - \mu = \phi(y_{t-1}-\mu) + a_t\\ a_t = \varepsilon_t\sqrt{\omega + \alpha a^2_{t-1}}\\ \varepsilon_t \sim \text{i.i.d.}\:\mathcal{N}(0,1) \tag{10.8} \]

The following constraints need to be assumed: \(|\phi|<1\), \(\omega > 0\), \(0 \leqslant \alpha < 1\).

Simulation example:

# individual series
set.seed(124)
T  <- 200
alpha <- 0.55
omega <- 1
mu = 0.1
phi = 0.8

at <- numeric(T)
at[1] <- omega/(1-alpha)
for (t in 2:T) {
  at[t] <- sqrt(omega + alpha*(at[t-1])^2) * rnorm(1, 0, 1) 
}
yt <- numeric(T)
yt[1] <- mu/(1-phi)
for (t in 2:T) {
  yt[t] <- mu + phi*(yt[t-1]-mu) + at[t]
}

plot(yt, type="l")

# multiple series and boxplots
nsim <- 1e3
w <- replicate(nsim,{ 
  at <- numeric(T)
  at[1] <- omega/(1-alpha)
  for (t in 2:T) {
    at[t] <- sqrt(omega + alpha*(at[t-1])^2) * rnorm(1, 0, 1) 
  }
  yt <- numeric(T)
  yt[1] <- mu/(1-phi)
  for (t in 2:T) {
    yt[t] <- mu + phi*(yt[t-1]-mu) + at[t]
  }
  yt}
)
boxplot(t(w), pch='.', ylim=c(-7,7))

10.4 ARMA + GARCH models

The models described above can be further extended. For example, ARCH models with more lags can be introduced (denoted as “ARCH(p)”, where p represents the number of lags):

\[a_t = \varepsilon_t\sqrt{\omega + \sum_{i=1}^p\alpha_i a^2_{t-i}}\\ \varepsilon_t \sim \text{i.i.d.}\:\mathcal{N}(0,1) \tag{10.9} \]

However, a drawback of ARCH models is that increased fluctuations occur only over short periods. To address this, GARCH models are introduced, allowing for the modeling of more persistent volatility.

The GARCH(p, q) model is defined as follows:

\[a_t = \varepsilon_t\sqrt{\omega + \sum_{i=1}^p\alpha_i a^2_{t-i}+\sum_{j=1}^q\beta_j \sigma^2_{t-j}}\\ \varepsilon_t \sim \text{i.i.d.}\:\mathcal{N}(0,1) \tag{10.9} \]

A GARCH model can be combined with an ARMA model. For example, the AR(1) + GARCH(1,1) model is formulated as follows:

\[y_t - \mu = \phi(y_{t-1}-\mu) + a_t\\ a_t = \varepsilon_t\sqrt{\omega + \alpha a^2_{t-1}+\beta \sigma^2_{t-1}}\\\ \varepsilon_t \sim \text{i.i.d.}\:\mathcal{N}(0,1) \tag{10.8} \]

Simulation example:

# individual series
set.seed(111)
T  <- 200
alpha <- 0.08
beta <- 0.7
omega <- 1
mu = 0.1
phi = 0.8

at <- numeric(T)
at[1] <- omega/(1-alpha)
sigma2t <- numeric(T)
sigma2t[1] <- omega
for (t in 2:T) {
  sigma2t[t] <- omega + alpha*(at[t-1])^2 + beta*sigma2t[t-1]
  at[t] <- sqrt(sigma2t[t]) * rnorm(1, 0, 1) 
}
yt <- numeric(T)
yt[1] <- mu/(1-phi)
for (t in 2:T) {
  yt[t] <- mu + phi*(yt[t-1]-mu) + at[t]
}

plot(yt, type="l")

# multiple series and boxplots
nsim <- 1e3
w <- replicate(nsim,{ 
  at <- numeric(T)
  at[1] <- omega/(1-alpha)
  sigma2t <- numeric(T)
  sigma2t[1] <- omega
  for (t in 2:T) {
    sigma2t[t] <- omega + alpha*(at[t-1])^2 + beta*sigma2t[t-1]
    at[t] <- sqrt(sigma2t[t]) * rnorm(1, 0, 1) 
  }
  yt <- numeric(T)
  yt[1] <- mu/(1-phi)
  for (t in 2:T) {
    yt[t] <- mu + phi*(yt[t-1]-mu) + at[t]
  }
  yt}
)

boxplot(t(w), pch='.', ylim=c(-12,12))

In practice, the GARCH(1,1) model is most commonly used due to its empirical accuracy and sufficient flexibility.

The parameter \(\alpha\) (\(\alpha_1\)) captures the effect of shocks. The parameter \(\beta\) (\(\beta_1\)) accounts for the autoregressive behavior of volatility. The parameter \(\omega\) models the long-term volatility level.

If \(\alpha_1 + \beta_1<1\), the GARCH(1,1) model is stationary.

The GARCH(1,1) process is best interpreted through the autocorrelation of \(a^2_t\). It can be shown that the first-order autocorrelation is given by:

\[\rho_{a^2}(1) = \frac{\alpha\left(1-\alpha\beta-\beta^2\right)}{1-2\alpha\beta-\beta^2} \tag{10.10}\]

The autocorrelations of higher orders in the GARCH(1,1) process are given by:

\[\rho_{a^2}(h) = \left(\alpha+\beta\right)^{h-1}\rho_{a^2}(1) \text{ dla } h \geqslant 2 \tag{10.11}\]

As shown by the formula (10.10), the same first-order autocorrelation can be obtained for different combinations of \(\alpha\) and \(\beta\); the formula (10.11) indicates that for larger values of \(\alpha + \beta\), the autocorrelation decays more slowly. This allows for flexible modeling of volatility clustering.

10.5 GARCH models – residuals and heavy-tailed distributions

10.5.1 Two types of residuals in GARCH models

When fitting an ARIMA(\(p_M, d, q_M\))+GARCH(\(p_V, q_V\)) model to a time series \(y_t\), i there are two types of residuals to consider:

  • ordinary residuals, denoted here as \(\widehat{a}_t\); these are the differences between the actual values \(y_t\) and conditional expected values; \(\widehat{a}_t\) serves as an estimate of \(a_t\).

  • standardized residuals, denoted here as \(\widehat{\varepsilon_t}\); to calculate \({\varepsilon}_t\), divide the ordinary residuals \(\widehat{a}_t\) by the estimated conditional standard deviations \(\widehat{\sigma}_t\):

$$

$$

Standardized residuals \(\widehat{\varepsilon_t}\) should be used for model diagnostics: if the model fits the data well, neither \(\widehat{\varepsilon_t}\) nor \(\widehat{\varepsilon_t}^2\) should exhibit autocorrelation. Additionally, if it is assumed that \(\varepsilon_t\) follows Gaussian distribution, this assumption can be verified with a Q-Q plot of standardized residuals \(\widehat{\varepsilon_t}\).

10.5.2 GARCH models and „fat tails”

Financial returns exhibit “heavy” or “fat” tails, which means the tail values are more extreme than in the normal distribution. One reason for this phenomenon is that the conditional variance is not constant over time. Variations in variance can be effectively modeled using GARCH processes. Indeed, even if the disturbance term \(\varepsilon_t\) is Gaussian, time series generated by GARCH processes have fat tails.

However, many financial time series have tails that are even heavier than those produced by GARCH models with Gaussian disturbances.

To account for this, analysts often replace the assumption of Gaussian “white noise”:

\[\varepsilon_t \sim \text{i.i.d.}\:\mathcal{N}(0,1)\]

with a distribution that can better capture tail behavior. A common alternative is to assume:

\[\varepsilon_t \sim \text{i.i.d.}\:\mathcal{STD}(0,1, \nu)\]

where \(\mathcal{STD}\) represents a generalized standardized Student’s t t-distribution, with the parameter \(\nu\) controlling the heaviness of the tails. This approach provides a more flexible framework for modeling extreme events in financial time series.

10.6 Fitting a GARCH model – example

10.7 Forecasting and simulation using GARCH model