## 4.2 Kalman filter

*State space modeling* provides a unified framework for treating a wide range of problems in time series analysis. It can be thought of as a universal and flexible modeling approach with a very efficient algorithm: the *Kalman filter*. The basic idea is to assume that the evolution of the system over time is driven by a series of unobserved or hidden values, which can only be measured indirectly through the observations of the system output. This modeling can be used for filtering, smoothing, and forecasting.

The Kalman filter is named after Rudolf E. Kalman, who was one of the primary developers of its theory (Kalman, 1960). It is sometimes called Kalman-Bucy filter or even Stratonovich-Kalman-Bucy filter, because Richard S. Bucy also contributed to the theory and Ruslan Stratonovich earlier proposed a more general nonlinear version. Arguably, some of the most comprehensive and authoritative classical references for state space models and Kalman filtering include the textbooks (B. D. O. Anderson and Moore, 1979) and (Durbin and Koopman, 2012) (originally published in 2001). Other textbook references on time series and Kalman include (Brockwell and Davis, 2002; A. Harvey, 1989; Shumway and Stoffer, 2017) and, in particular, for financial time series (A. Harvey and Koopman, 2009; Lütkepohl, 2007; Tsay, 2010; Zivot et al., 2004).

The Kalman filter, which was employed by NASA during the 1960s in the Apollo program, now boasts a vast array of technological applications. It is commonly utilized in the guidance, navigation, and control of vehicles, including aircraft, spacecraft, and maritime vessels. It has also found applications in time series analysis, signal processing, and econometrics. More recently, it has become a key component in robotic motion planning and control, as well as trajectory optimization.

Currently, the software implementation of Kalman filtering is widespread and numerous libraries are available in most programming languages, e.g., (Helske, 2017; Holmes et al., 2012; Petris and Petrone, 2011; Tusell, 2011) for the R programming language.

### 4.2.1 State space model

Mathematically, the Kalman filter is based on the following linear Gaussian state space model with discrete-time \(t=1,\dots,T\) (Durbin and Koopman, 2012):
\[\begin{equation}
\qquad\qquad
\begin{aligned}
\bm{y}_t &= \bm{Z}\bm{\alpha}_t + \bm{\epsilon}_t\\
\bm{\alpha}_{t+1} &= \bm{T}\bm{\alpha}_t + \bm{\eta}_t
\end{aligned}
\quad
\begin{aligned}
& \textm{(observation equation)}\\
& \textm{(state equation)},
\end{aligned}
\tag{4.2}
\end{equation}\]
where \(\bm{y}_t\) denotes the observations over time with observation matrix \(\bm{Z}\), \(\bm{\alpha}_t\) represents the unobserved or hidden internal state with state transition matrix \(\bm{T}\), and the two noise terms \(\bm{\epsilon}_t\) and \(\bm{\eta}_t\) are Gaussian distributed with zero mean and covariance matrices \(\bm{H}\) and \(\bm{Q}\), respectively, i.e., \(\bm{\epsilon}_t \sim \mathcal{N}(\bm{0},\bm{H})\) and \(\bm{\eta}_t \sim \mathcal{N}(\bm{0},\bm{Q})\). The initial state can be modeled as \(\bm{\alpha}_1 \sim \mathcal{N}(\bm{a}_1,\bm{P}_1)\). Mature and efficient software implementations of the model in (4.2) are readily available, e.g., (Helske, 2017).^{15}

It is worth mentioning that an alternative notation widespread in the literature for the state space model (4.2) is to shift the time index in the state equation by one: \(\bm{\alpha}_{t} = \bm{T}\bm{\alpha}_{t-1} + \bm{\eta}_t\). This change in notation only has a slight effect on the initial point of the system, which is now \(\bm{\alpha}_{0}\) (corresponding to the time before the first observation) instead of \(\bm{\alpha}_{1}\) (corresponding to the same time as the first observation); other than that, it is just a notational preference.

The parameters of the state space model in (4.2) (i.e., \(\bm{Z}\), \(\bm{T}\), \(\bm{H}\), \(\bm{Q}\), \(\bm{a}_1\), and \(\bm{P}_1\)) can be provided by the user (if known). Otherwise, they can be inferred from the data with algorithms based on maximum likelihood estimation. Again, mature and efficient software implementation are available for parameter fitting (Holmes et al., 2012).^{16} In order to build some intuition about state space models, let us look at a simple yet illustrative example.

**Example 4.1 (Tracking via state space model) **Suppose we want to track an object in one dimension over time, \(x_t\), from noisy measurements \(y_t= x_t + \epsilon_t\) measured at time intervals \(\Delta t\). We provide four different ways to model this system, from the simplest to the most advanced, based on the state space model in (4.2).

If we define the internal state simply as the position, \(\alpha_t = x_t\), then (4.2) simply becomes: \[ \begin{aligned} y_t &= x_t + \epsilon_t\\ x_{t+1} &= x_t + \eta_t, \end{aligned} \] where it is tacitly assumed that the position \(x_t\) does not change much.

If now we incorporate the velocity \(v_t\) in the internal state, \(\bm{\alpha}_t = \begin{bmatrix} x_t\\ v_t \end{bmatrix}\), then the state space model becomes \[ \begin{aligned} y_t &= \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} x_t\\ v_t \end{bmatrix} + \epsilon_t\\ \begin{bmatrix} x_{t+1}\\ v_{t+1} \end{bmatrix} &= \begin{bmatrix} 1 & \Delta t\\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_t\\ v_t \end{bmatrix} + \bm{\eta}_t, \end{aligned} \] where now the position is better modeled thanks to also modeling the velocity.

We can further include the acceleration \(a_t\) into the internal state, \(\bm{\alpha}_t = \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix}\), leading to the state space model \[ \begin{aligned} y_t &= \begin{bmatrix} 1 & 0 & 0\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \epsilon_t\\ \begin{bmatrix} x_{t+1}\\ v_{t+1}\\ a_{t+1} \end{bmatrix} &= \begin{bmatrix} 1 & \Delta t & 0\\ 0 & 1 & \Delta t\\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \bm{\eta}_t. \end{aligned} \]

Finally, we can further improve the model, especially if the sampling rate is not high enough, by including the acceleration into the position equation, \(x_{t+1} = x_t + \Delta t v_t + \frac{1}{2}\Delta t^2 a_t\), leading to \[ \begin{aligned} y_t &= \begin{bmatrix} 1 & 0 & 0\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \epsilon_t\\ \begin{bmatrix} x_{t+1}\\ v_{t+1}\\ a_{t+1} \end{bmatrix} &= \begin{bmatrix} 1 & \Delta t & \frac{1}{2}\Delta t^2\\ 0 & 1 & \Delta t\\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} x_t\\ v_t\\ a_t \end{bmatrix} + \bm{\eta}_t. \end{aligned} \]

It is important to emphasize that the state space model in (4.2) is not the most general one. One trivial extension is to allow the parameters \(\bm{Z}\), \(\bm{T}\), \(\bm{H}\), and \(\bm{Q}\) to change over time: \(\bm{Z}_t\), \(\bm{T}_t\), \(\bm{H}_t\), and \(\bm{Q}_t\). More generally, one can relax the two key assumptions in (4.2): *i*) allowing nonlinear functions of \(\bm{\alpha}_t\) (instead of the linear functions \(\bm{Z}\bm{\alpha}_t\) and \(\bm{T}\bm{\alpha}_t\)) and *ii*) not assuming the noise distributions to be Gaussian. This leads to extensions proposed in the literature such as the *extended Kalman filter*, the *unscented Kalman filter*, and even the more general (albeit more computationally demanding) *particle filtering* (Durbin and Koopman, 2012).

### 4.2.2 Kalman filtering and smoothing

The Kalman filter is a very efficient algorithm to optimally solve the state space model in (4.2), which is linear and assumes Gaussian distributions for the noise terms. Its computational cost is manageable to the point that it was even used in the Apollo program by NASA in the 1960s: it was quite remarkable that it could be implemented in a tiny and rudimentary computer (2k of magnetic core RAM, 36k wire rope, CPU built from ICs with a clock speed under 100 kHz).

The objective of *Kalman filtering* is to characterize the distribution of the hidden state at time \(t\), \(\bm{\alpha}_t\), given the observations up to (and including) time \(t\), \(\bm{y}_1,\dots,\bm{y}_t\), i.e., in a causal manner. Since the distribution of the noise terms is Gaussian, it follows that the conditional distribution of \(\bm{\alpha}_t\) is also Gaussian; therefore, it suffices to characterize the conditional mean and conditional covariance matrix:
\[
\begin{aligned}
\bm{a}_{t\mid t} &\triangleq \E\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_t)\right]\\
\bm{P}_{t\mid t} &\triangleq \textm{Cov}\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_t)\right].
\end{aligned}
\]
For forecasting purposes, one is really interested in the distribution of the hidden state at time \(t+1\), \(\bm{\alpha}_{t+1}\), given the observations up to (and including) time \(t\), denoted by \(\bm{a}_{t+1\mid t}\) and \(\bm{P}_{t+1\mid t}\). These filtering and forecasting quantities can be efficiently computed using a “forward pass” algorithm that goes from \(t=1\) to \(t=T\) in a sequential way, so that it can operate in real time (Durbin and Koopman, 2012).

On the other hand, the objective of *Kalman smoothing* is to characterize the distribution of the hidden state at time \(t\), \(\bm{\alpha}_t\), given all the observations, \(\bm{y}_1,\dots,\bm{y}_T\), i.e., in a non-causal manner. Such distribution is also Gaussian and it is fully characterized by the following conditional mean and conditional covariance matrix:
\[
\begin{aligned}
\bm{a}_{t\mid T} &\triangleq \E\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_T)\right]\\
\bm{P}_{t\mid T} &\triangleq \textm{Cov}\left[\bm{\alpha}_t \mid (\bm{y}_1,\dots,\bm{y}_T)\right].
\end{aligned}
\]
Interestingly, these quantities can also be efficiently computed using a “backward pass” algorithm that goes from \(t=T\) to \(t=1\) (Durbin and Koopman, 2012). Since this requires all observations, it is naturally batch-processing algorithm rather than an online one.

Overall, the full characterization of the hidden states executes both forward and backward passes very efficiently. The choice of filtering versus smoothing depends on whether the application requires a causal approach (real time) or non-causal (batch processing). Obviously, the hidden state characterization of smoothing is much better than filtering as it uses more information.

Figure 4.2 shows the result of Kalman filtering and Kalman smoothing for the four different state space models in Example 4.1 (properly fitting the variances of the noise terms via maximum likelihood). In general, the more accurate the model, the better the performance. In this specific example, however, the differences are minimal. On the other hand, Kalman smoothing significantly outperforms Kalman filtering because it has access to all observations simultaneously.

### References

*Optimal filtering*. Englewood Cliffs: Prentice-Hall.

*Introduction to time series and forecasting*. Springer.

*Time series analysis by state space methods*. Oxford University Press.

*Forecasting, structural time series models and the Kalman filter*. Cambridge University Press.

*IEEE Control Systems Magazine*,

*29*(6), 71–81.

*Journal of Statistical Software*,

*78*(10), 1–39.

*The R Journal*,

*4*(1), 11–19.

*Journal of Basic Engineering*,

*82*, 35–45.

*New introduction to multiple time series analysis*. Springer.

*Journal of Statistical Software*,

*41*(4), 1–25.

*Time series analysis and its applications*. Springer.

*Analysis of financial time series*. John Wiley & Sons.

*Journal of Statistical Software*,

*39*(2), 1–27.

*State space and unobserved component models: Theory and applications*, pages 284–335. Cambridge University Press.

The R package

`KFAS`

implements the Kalman filter for the model (4.2) (Helske, 2017). The Python package`filterpy`

provides Kalman methods. ↩︎The R package

`MARSS`

implements algorithms for fitting the unknown parameters of the state space model (4.2) based on observed time series data (Holmes et al., 2012). ↩︎