4.2 Kalman Filter

$\newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \newcommand{\textnormal}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}}$

State-space modeling provides a unified framework for treating a wide range of problems in time series analysis. It can be thought of as a universal and flexible modeling approach with a very efficient algorithm: the Kalman filter. The basic idea is to assume that the evolution of the system over time is driven by a series of unobserved or hidden values, which can only be measured indirectly through observations of the system output. This modeling can be used for filtering, smoothing, and forecasting.

The Kalman filter is named after Rudolf E. Kalman, who was one of the primary developers of the theory (Kalman, 1960). It is sometimes called the Kalman–Bucy filter or even Stratonovich–Kalman–Bucy filter, because Richard S. Bucy also contributed to the theory and Ruslan Stratonovich earlier proposed a more general nonlinear version. Arguably, some of the most comprehensive and authoritative classical references for state-space models and Kalman filtering include the textbooks B. D. O. Anderson and Moore (1979) and Durbin and Koopman (2012), which was originally published in 2001. Other textbook references on time series and the Kalman filter include Brockwell and Davis (2002), Shumway and Stoffer (2017), A. Harvey (1989) and, in particular, for financial time series, Zivot et al. (2004), Tsay (2010), Lütkepohl (2007), and A. Harvey and Koopman (2009).

The Kalman filter, which was employed by NASA during the 1960s in the Apollo program, now boasts a vast array of technological applications. It is commonly utilized in the guidance, navigation, and control of vehicles, including aircraft, spacecraft, and maritime vessels. It has also found applications in time series analysis, signal processing, and econometrics. More recently, it has become a key component in robotic motion planning and control, as well as trajectory optimization.

Currently, the software implementation of Kalman filtering is widespread and numerous libraries are available in most programming languages, for example Tusell (2011), Petris and Petrone (2011), Helske (2017), and Holmes et al. (2012) for the R programming language.

4.2.1 State-Space Model

Mathematically, the Kalman filter is based on the following linear Gaussian state-space model with discrete time $t=1,\dots,T$ (Durbin and Koopman, 2012): $\begin{equation} \qquad\qquad \begin{aligned} \bm{y}_t &= \bm{Z}\bm{\alpha}_t + \bm{\epsilon}_t\\ \bm{\alpha}_{t+1} &= \bm{T}\bm{\alpha}_t + \bm{\eta}_t \end{aligned} \quad \begin{aligned} & \textm{(observation equation)},\\ & \textm{(state equation)}, \end{aligned} \tag{4.2} \end{equation}$ where $\bm{y}_t$ denotes the observations over time with observation matrix $\bm{Z}$ , $\bm{\alpha}_t$ represents the unobserved or hidden internal state with state transition matrix $\bm{T}$ , and the two noise terms $\bm{\epsilon}_t$ and $\bm{\eta}_t$ are Gaussian distributed with zero mean and covariance matrices $\bm{H}$ and $\bm{Q}$ , respectively, that is, $\bm{\epsilon}_t \sim \mathcal{N}(\bm{0},\bm{H})$ and $\bm{\eta}_t \sim \mathcal{N}(\bm{0},\bm{Q})$ . The initial state can be modeled as $\bm{\alpha}_1 \sim \mathcal{N}(\bm{a}_1,\bm{P}_1)$ . Mature and efficient software implementations of the model in (4.2) are readily available, for example Helske (2017).¹⁵

It is worth mentioning that an alternative notation widespread in the literature for the state-space model (4.2) is to shift the time index in the state equation by one: $\bm{\alpha}_{t} = \bm{T}\bm{\alpha}_{t-1} + \bm{\eta}_t$ . This change in notation only has a slight effect on the initial point of the system, which is now $\bm{\alpha}_{0}$ (corresponding to the time before the first observation) instead of $\bm{\alpha}_{1}$ (corresponding to the same time as the first observation); other than that, it is just a notational preference.

The parameters of the state-space model in (4.2) (i.e., $\bm{Z}$ , $\bm{T}$ , $\bm{H}$ , $\bm{Q}$ , $\bm{a}_1$ , and $\bm{P}_1$ ) can be provided by the user (if known). Otherwise, they can be inferred from the data with algorithms based on maximum likelihood estimation. Again, mature and efficient software implementations are available for parameter fitting (Holmes et al., 2012).¹⁶ In order to build some intuition about state-space models, let us look at a simple yet illustrative example.

Example 4.1 (Tracking via state-space model) Suppose we want to track an object in one dimension over time, $x_t$ , from noisy measurements $y_t= x_t + \epsilon_t$ measured at time intervals $\Delta t$ . We provide four different ways to model this system, from the simplest to the most advanced, based on the state-space model in (4.2).

If we define the internal state simply as the position, $\alpha_t = x_t$ , then (4.2) simply becomes $\begin{aligned} y_t &= x_t + \epsilon_t,\\ x_{t+1} &= x_t + \eta_t, \end{aligned}$ where it is tacitly assumed that the position $x_t$ does not change much.
If now we incorporate the velocity $v_t$ in the internal state, $\bm{\alpha}_t = \begin{bmatrix} x_t\\ v_t \end{bmatrix}$ , then the state-space model becomes $\begin{aligned} y_t &= \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} x_t\\ v_t \end{bmatrix} + \epsilon_t,\\ \begin{bmatrix} x_{t+1}\\ v_{t+1} \end{bmatrix} &= \begin{bmatrix} 1 & \Delta t\\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_t\\ v_t \end{bmatrix} + \bm{\eta}_t, \end{aligned}$ where now the position is better modeled thanks to also modeling the velocity.
We can further include the acceleration $a_t$ in the internal state, $\bm{\alpha}_t = \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix}$ , leading to the state-space model $\begin{aligned} y_t &= \begin{bmatrix} 1 & 0 & 0\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \epsilon_t,\\ \begin{bmatrix} x_{t+1}\\ v_{t+1}\\ a_{t+1} \end{bmatrix} &= \begin{bmatrix} 1 & \Delta t & 0\\ 0 & 1 & \Delta t\\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \bm{\eta}_t. \end{aligned}$
Finally, we can further improve the model, especially if the sampling rate is not high enough, by including the acceleration in the position equation, $x_{t+1} = x_t + \Delta t v_t + \frac{1}{2}\Delta t^2 a_t$ , leading to $\begin{aligned} y_t &= \begin{bmatrix} 1 & 0 & 0\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \epsilon_t,\\ \begin{bmatrix} x_{t+1}\\ v_{t+1}\\ a_{t+1} \end{bmatrix} &= \begin{bmatrix} 1 & \Delta t & \frac{1}{2}\Delta t^2\\ 0 & 1 & \Delta t\\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \bm{\eta}_t. \end{aligned}$

It is important to emphasize that the state-space model in (4.2) is not the most general one. One trivial extension is to allow the parameters $\bm{Z}$ , $\bm{T}$ , $\bm{H}$ , and $\bm{Q}$ to change over time: $\bm{Z}_t$ , $\bm{T}_t$ , $\bm{H}_t$ , and $\bm{Q}_t$ . More generally, one can relax the two key assumptions in (4.2), by (i) allowing nonlinear functions of $\bm{\alpha}_t$ (instead of the linear functions $\bm{Z}\bm{\alpha}_t$ and $\bm{T}\bm{\alpha}_t$ ) and (ii) not assuming the noise distributions to be Gaussian. This leads to extensions proposed in the literature such as the extended Kalman filter, the unscented Kalman filter, and even the more general (albeit more computationally demanding) particle filtering (Durbin and Koopman, 2012).

4.2.2 Kalman Filtering and Smoothing

The Kalman filter is a very efficient algorithm to optimally solve the state-space model in (4.2), which is linear and assumes Gaussian distributions for the noise terms. Its computational cost is manageable to the point that it was even used in the Apollo program by NASA in the 1960s: it was quite remarkable that it could be implemented in a tiny and rudimentary computer (2 $\,$ KB of magnetic core RAM, 36 $\,$ KB of core rope memory (ROM), CPU built from ICs with a clock speed under 100 $\,$ kHz).

The objective of Kalman filtering is to characterize the distribution of the hidden state at time $t$ , $\bm{\alpha}_t$ , given the observations up to (and including) time $t$ , $\bm{y}_1,\dots,\bm{y}_t$ , that is, in a causal manner. Since the distribution of the noise terms is Gaussian, it follows that the conditional distribution of $\bm{\alpha}_t$ is also Gaussian; therefore, it suffices to characterize the conditional mean and conditional covariance matrix: $\begin{aligned} \bm{a}_{t\mid t} &\triangleq \E\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_t)\right],\\ \bm{P}_{t\mid t} &\triangleq \textm{Cov}\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_t)\right]. \end{aligned}$ For forecasting purposes, one is really interested in the distribution of the hidden state at time $t+1$ , $\bm{\alpha}_{t+1}$ , given the observations up to (and including) time $t$ , denoted by $\bm{a}_{t+1\mid t}$ and $\bm{P}_{t+1\mid t}$ . These filtering and forecasting quantities can be efficiently computed using a “forward pass” algorithm that goes from $t=1$ to $t=T$ in a sequential way, so that it can operate in real time (Durbin and Koopman, 2012).

On the other hand, the objective of Kalman smoothing is to characterize the distribution of the hidden state at time $t$ , $\bm{\alpha}_t$ , given all the observations, $\bm{y}_1,\dots,\bm{y}_T$ , that is, in a noncausal manner. Such a distribution is also Gaussian and it is fully characterized by the following conditional mean and conditional covariance matrix: $\begin{aligned} \bm{a}_{t\mid T} &\triangleq \E\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_T)\right],\\ \bm{P}_{t\mid T} &\triangleq \textm{Cov}\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_T)\right]. \end{aligned}$ Interestingly, these quantities can also be efficiently computed using a “backward pass” algorithm that goes from $t=T$ to $t=1$ (Durbin and Koopman, 2012). Since this requires all observations, it is naturally a batch-processing algorithm rather than an online one.

Overall, the full characterization of the hidden states executes both forward and backward passes very efficiently. The choice of filtering vs. smoothing depends on whether the application requires a causal approach (real time) or noncausal (batch processing). Obviously, the hidden state characterization of smoothing is much better than filtering as it uses more information.

Figure 4.2 shows the result of Kalman filtering and Kalman smoothing for the four different state-space models in Example 4.1 (properly fitting the variances of the noise terms via maximum likelihood). In general, the more accurate the model, the better the performance. In this specific example, however, the differences are minimal. On the other hand, Kalman smoothing significantly outperforms Kalman filtering because it has access to all observations simultaneously.

Figure 4.2: Example of Kalman position tracking under the four different models from Example 4.1.

References

Anderson, B. D. O., and Moore, J. B. (1979). Optimal Filtering. Englewood Cliffs: Prentice Hall.

Brockwell, P. J., and Davis, R. A. (2002). Introduction to Time Series and Forecasting. Springer.

Durbin, J., and Koopman, S. J. (2012). Time Series Analysis by State Space Methods. Oxford University Press.

Harvey, A. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

Harvey, A., and Koopman, S. J. (2009). Unobserved components models in economics and finance: The role of the Kalman filter in time series econometrics. IEEE Control Systems Magazine, 29(6), 71–81.

Helske, J. (2017). KFAS: Exponential family state space models in R. Journal of Statistical Software, 78(10), 1–39.

Holmes, E. E., Ward, E. J., and Wills, K. (2012). MARSS: Multivariate autoregressive state-space models for analyzing time-series data. The R Journal, 4(1), 11–19.

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.

Lütkepohl, H. (2007). New Introduction to Multiple Time Series Analysis. Springer.

Petris, G., and Petrone, S. (2011). State space models in R. Journal of Statistical Software, 41(4), 1–25.

Shumway, R. H., and Stoffer, D. S. (2017). Time Series Analysis and Its Applications. Springer.

Tsay, R. S. (2010). Analysis of Financial Time Series. John Wiley & Sons.

Tusell, F. (2011). Kalman filtering in R. Journal of Statistical Software, 39(2), 1–27.

Zivot, E., Wang, J., and Koopman, S. J. (2004). State space modeling in macroeconomics and finance using SsfPack for S+FinMetrics. In A. Harvey, S. J. Koopman, and N. Shephard, editors, State space and unobserved component models: Theory and applications, pages 284–335. Cambridge University Press.

The R package KFAS implements the Kalman filter for the model (4.2) (Helske, 2017). The Python package filterpy provides Kalman methods. ↩︎
The R package MARSS implements algorithms for fitting the unknown parameters of the state-space model (4.2) based on observed time series data (Holmes et al., 2012). ↩︎