Chapter 6 Kalman Filters
“I can see clearly now the rain is gone” -Johnny Nash
Motivation
Kalman Filters are not a new topic in Statistics, but nonetheless remain quite relevant: the potential uses for this tool widely vary. The crux of the Kalman Filter is to estimate the value of some ‘unknown state’ based on some observed data. Kalman Filters are built upon Stochastic Processes, and this chapter is thus a viable opportunity for us to explore a practical application of our subject.
6.1 Introducing Kalman Filters
We began to introduce what a Kalman filter is above: a tool that estimates an unknown state based on observed data (so that our estimation is a combination of this observed data and a prediction).
One intuitive example is estimating the location of a satellite orbiting Earth. We take the ‘unknown state’ to be the true, exact position of the satellite, which we are trying to estimate from the ground. First, we have a ‘prediction step’: we make a prediction about where the satellite will be in the future based on where it was launched and using knowledge about gravity, the Earth’s rotation, etc. Second, we have a ‘measurement step’: we have a sensor on the satellite that beams us back information about the satellite’s location (the information isn’t exactly perfect; if it was, we wouldn’t need to estimate the satellite’s location in the first place!), and we use this measurement to update our estimate of where the satellite truly is (again, it’s only an estimate because both our measurement and our prediction are noisy).
This is some simple intuition, but opens up a world of complexity: the unknown state, prediction and measurement can all have multiple dimensions (depending on the structure of the problem; what if, for example, we wanted the location of two satellites?). We can use Kalman Filters to rigorously define a useful approach to making these estimates.
As we will see, Kalman Filters assume Gaussian (Normally distributed) inputs. One convenient result of this is that Gaussian predictions, combined with Gaussian measurements, give Gaussian estimates. This links nicely into the next step of the Kalman Filter; if we make a Gaussian estimate at time t, we can feed that Gaussian estimate into the Kalman Filter to get another Gaussian estimate at time t+1, and so on.
Finally, we will be orienting our discussion in a statistical sense, using Stochastic Processes, statistical distributions and the like to frame the Kalman Filter (traditionally, this is done with tools from physics). This explanation is adapted from this resource, and we will try to thoroughly explain some of the more technical aspects to aid understanding.
6.2 Deriving the Kalman Filter
By this point, you might be confused by the vague, qualitative descriptions of the Kalman Filter; let’s dive into how we actually do it!
Let’s denote the ‘unknown state’ (from above, the location of the satellite) as θt, and the data that we observe (from above, the information beamed back from the satellite) as Yt. Note that θt and Yt are stochastic processes; they are random variables that we observe over time! Also note that θt and Yt can be scalars (just a single number) or vectors (remember, we can construct a complicated problem with many dimensions) and that their dimensions are independent (θt could be a vector while Yt is a scalar). We can start there:
Equation 1.
θt=Gtθt−1+wt
Let Gt be some known quantity (more on this soon) and wt is some Gaussian random variable centered at 0 with a known variance (specifically, wt∼N(0,σ2wt for some known variance σ2wt). Note that this is the equation for θt for t=1,2,..., as we just let θ0 equal some constant.
The intuition here is that the tth value in the stochastic process θt is just the t−1th value θt−1, times Gt (a ‘transformation’ scalar or vector) plus some noise wt.
This is the definition of the unobserved state θt. Let us now define Yt, the data that we observe, in a similar fashion:
Equation 2.
Yt=Ftθt+vt
Where, again, Ft is a known quantity and vt∼N(0,σ2vt) for some known variance σ2vt.
Note that Yt is a function of θt. The intuition here is similar: we have some ‘true, unobserved state of nature’ θt which we multiply by some quantity Ft (again, ‘transformation’ scalar or matrix) and add on some random noise vt to get back to the data we actually observe Yt. Finally, note that vt and wt are independent (although they are not necessarily identical, since their variances could be different).
We now have these two - admittedly vague - equations, and can start to try to estimate θt based on earlier values in the stochastic process (θ1,θ2,...,θt−1) and our observed data Yt. We can take a Bayesian approach and start by studying the distribution of θt conditional on the data up to time t (which is Y1,Y2,...,Yt):
Equation 3.
P(θt|1,Y2,...,Yt)∝P(Yt|θt,1,Y2,...,Yt−1)P(θt|1,Y2,...,Yt−1)
A couple of things to note here: we use the ∝, or ‘proportional to,’ because, with respect to θt, the denominator on the right hand side of the equation, P(Yt|1,Y2,...,Yt−1), will be a constant (it won’t be a function of θt) and thus we will ignore it. Further, note that this relation is similar to P(θ1|Yt)∝P(Yt|θt)P(θt), just with extra conditioning on Y1,Y2,...,Yt−1.
We can turn now towards exploring the two quantities on the right hand side of the above equation. We can start with finding the distribution of θ|Y1,...,Yt−1. For this, we will have to assume:
Equation 4.
θt−1|Y1,...,Yt−1∼N(ˆθt−1,Σt−1))
This might seem strange. In English, this equation is saying "in the previous period t−1, we have a prediction θt−1 based on the observed data Y1,...,Yt−1 (remember that the hat on ˆθt−1 means ‘prediction’ or ‘estimate’). This prediction has a Normal distribution with some mean ˆθt−1 and some variance Σt−1 (at this point in the derivation, the mean and variance are arbitrary and we will take them as given).
This strange step will become clearer as we solve out the posterior distribution for θt. At this point in the derivation, we can simply think of ˆθt−1 and Σt−1 as our best estimates from the previous period t−1 (later on in the derivation we will have a better idea of what these ‘best estimates’ actually are). We initialize these estimates at time t=0 (we use our ‘best guesses’) so that we always have a ‘best estimate’ to use from the previous period (even at t=1, we can use our best guess from t=0). The reason we use the Normal distribution will become clearer later on in the derivation.
Now that we have a posterior distribution for θt−1 (even if it seems we didn’t do much) and thus we can write the distribution of θt|Y1,...,Yt−1. Recall Equation 1:
θt=Gtθt−1+wt
Let’s condition everything on Y1,...,Yt−1:
θt|Y1,...,Yt−1=(Gt|Y1,...,Yt−1)(θt−1|Y1,...,Yt−1)+(wt||Y1,...,Yt−1)
Consider the right hand side of this equation. We have Gt, which is a known quantity and thus doesn’t change when we condition on it, wt, which does not depend on the data Y and thus doesn’t change, and, of course, θt−1|Y1,...,Yt−1, for which we have assumed above has a Normal distribution with mean ˆθt−1 and variance Σt−1. We can therefore re-write: θt|Y1,...,Yt−1=Gt(θt−1|Y1,...,Yt−1)+wt
Since the two random variables left on the right hand side, wt and θt−1|Y1,...,Yt−1, are Gaussian, and the linear combination of Gaussian random variables is Gaussian, we have that θt|Y1,...,Yt−1 is also Gaussian. Let’s find the expectation and variance:
E(θt|Y1,...,Yt−1)=E(Gt(θt−1|Y1,...,Yt−1)+wt)
=GtE(θt−1|Y1,...,Yt−1)+E(wt)
We can factor out the Gt because it is known. Recall that E(wt)=0 by definition, and E(θt−1|Y1,...,Yt−1)=ˆθt−1, by definition (well, we defined it!). We are thus left with:
E(θt|Y1,...,Yt−1)=Gtˆθt−1
Now for the variance:
Var(θt|Y1,...,Yt−1)=Var(Gt(θt−1|Y1,...,Yt−1)+wt)
=GtVar(θt−1|Y1,...,Yt−1)G′t+Var(wt)
Note that the Gt becomes Gt and G′t, to adjust for the cases when Gt has more than one dimension (if Gt is just a scalar, this will happily reduce to G2t, like we are used to). Remember that Var(wt)=σ2wt by definition, and, again, Var(θt−1|Y1,...,Yt−1)=Σt−1 because we defined it that way! We are thus left with:
Var(θt|Y1,...,Yt−1)=GtΣt−1G′t+σ2wt
So, we have shown that θt|Y1,...,Yt−1 is Gaussian, and we have found the expectation and variance, which means that we can write:
Equation 5.
θt|Y1,...,Yt−1∼N(Gtˆθt−1,GtΣt−1G′t+σ2wt)
Excellent; we now have half of the right hand side of Equation 3., and are left with needing to find the distribution of Yt|θt,Y1,...,Yt−1.
To begin, let’s define et, which is intuitively the ‘error’ of Yt:
et=Yt−E(Yt|Y1,...,Yt−1)
This is intuitive because it is simply the actual value Yt minus the expected value of Yt using the history of Y up to time t−1. We can try to find this expectation by plugging in our definition of Y from Equation 2:
E(Yt|Y1,...,Yt−1)=E(Ftθt+vt|Y1,...,Yt−1)
We have that E(vt)=0 by definition and that Ft is a known quantity, so this becomes:
=FtE(θt|Y1,...,Yt−1) Wait a second; we just solved for the distribution of θt|Y1,...,Yt−1 in the previous part (Equation 5). We have that the expectation is Gtˆθt−1, so we can write:
E(Yt|Y1,...,Yt−1)=FtGtˆθt−1
And thus we are left with:
Equation 6:
et=Yt−FtGtˆθt−1
Here’s the nifty thing about et: since, at time t, Ft, Gt and ˆθt−1 are all known quantities (Ft and Gt are known by definition, and ˆθt−1 is our best guess for θt−1, which we know at time t), observing et will allow us to calculate Yt (just add FtGtˆθt−1, which we can do since all of those terms are known). Therefore, let’s try finding the distribution of et|θt,Y1,...,Yt−1 and use that to find the distribution of Yt|θt,Y1,...,Yt−1 (the transition should be easy!).
Let’s get started, then, on finding the distribution of et|θt,Y1,...,Yt−1. We will start by re-writing Equation 6; specifically, we will plug Equation 2, which gives us an equation for Yt, into the Yt of Equation 6:
et|θt,Y1,...,Yt−1∼Yt−FtGtˆθt−1|θt,Y1,...,Yt−1
Plugging in for Yt:
=Ftθt+vt−FtGtˆθt−1|θt,Y1,...,Yt−1 We can factor out Ft, and are left with:
et|θt,Y1,...,Yt−1∼Ft(θt−Gtˆθt−1)+vt|θt,Y1,...,Yt−1
The only random variables left here are θt and vt. Note that, however, we are also conditioning on θt, so θt is known. Further, note that vt is independent of both θt and the history of Yt. Therefore, we are simply left with a bunch of constants and a Normal random variable vt, so we have that et|θt,Y1,...,Yt−1 is Gaussian. The mean and variance are easy to find: the mean of vt is zero, so we are just left with the other constants, and those constants have variance zero, so we are left with the variance of vt. We thus have:
Equation 7.
et|θt,Y1,...,Yt−1∼N(Ft(θt−Gtˆθt−1),σ2vt)
Remember, the initial point here was to find the two densities on the right hand side of Equation 3:
P(θt|1,Y2,...,Yt)∝P(Yt|θt,1,Y2,...,Yt−1)P(θt|1,Y2,...,Yt−1)
We have found them with Equation 5:
θt|Y1,...,Yt−1∼N(Gtˆθt−1,Σt−1G′t+σ2wt)
And with Equation 7:
et|θt,Y1,...,Yt−1∼N(Ft(θt−Gtˆθt−1),σ2vt)
Well, we didn’t quite find it with Equation 7, but remember that observing et is enough to give us Yt (since et is just Yt minus some constants).
Anyways, we now have the we have the two densities that we need to find P(θt|1,Y2,...,Yt), which is the crux of the Kalman Filter. To finish the calculation, we would need to calculate a difficult integral; instead of doing all of that taxing math, we can just use a trick embedded in the construction of a Bivariate Normal (BVN) distribution.
Note these two important equations for the BVN distribution; for random variables X and Y, we have:
(XY)∼N((μxμy),(σ2xσxyσxyσ2y)).
This simply means that E(X)=μx and Var(X)=σ2x (and symmetrically for Y), and that the covariance of X and Y is σxy. One really nice property of the Bivariate Normal is that we have this result for the conditional probability of X given Y:
Equation 9.
(X|Y=y)∼N(μx+σxy(σ2y)−1(y−μy),σ2x−σxy(σ2y)−1σxy)
As we mentioned, Equation 8 implies Equation 9. However, if Equation 9 holds and we have that the marginal distribution of Y is N(μy,σ2y), then Equation 8 holds. We will try to show that the latter is true by using the random variables θt and et (remember, if we observed et then we can get to Yt)..
Throughout this exercise, we are going to use versions of Equation 8 and Equation 9, but with extra conditioning on Y1,...,Yt−1.
Using Equations 8 and 9, then, let’s let X=et and Y=θt, and let’s start with considering et|θt,Y1,...,Yt−1 in the paradigm above (eventually, we are going to go after the posterior distribution of θt instead of et, but we can start here nonetheless). Since we are conditioning on Y1,...,Yt−1 throughout, we have that μy=Gtˆθt−1 and σ2y=GtΣt−1G′t+σ2wt (that is, the mean and variance of θt|Y1,...,Yt−1, which we found in Equation 5).
We can also use Equation 7, which gives the mean and variance of et|θt,Y1,...,Yt−1. This means that we have the mean of X (which is et here) conditional on Y (which is θt here). We can thus set the mean of et|θt,Y1,...,Yt−1 from Equation 7 equal to the conditional mean of X above:
Ft(θt−Gtˆθt−1)=μx+σxy(σ2y)−1(y−μy)
Plugging in σ2y=GtΣt−1G′t+σ2wt, μy=Gtˆθt−1 and y=θt above, we have:
Ft(θt−Gtˆθt−1)=μx+σxy(θt−Gtˆθt−1)(GtΣt−1G′t+σ2wt)
The unknowns are μx and σxy, and it’s clear that μx=0 and σxy=Ft(GtΣt−1G′t+σ2wt) give us a solution here. Therefore, we were able to use the mean of X|Y to find μx and σxy; let’s try to use the variance now. Just like we did above, we can set equal our variance in Equation 7 to our variance in Equation 9:
σ2vt=σ2x−σxy(σ2y)−1σxy Plugging in for σ2y and σxy, we have:
=σ2x−(Ft(GtΣt−1G′t+σ2wt))(GtΣt−1G′t+σ2wt)−1(Ft(GtΣt−1G′t+σ2wt)) This simplifies nicely:
σ2vt=σ2x−Ft(GtΣt−1G′t+σ2wt)F′t
Note again that we include F′ in case F has more than one dimension. We can easily solve for σ2x:
σ2x=σ2vt+Ft(GtΣt−1G′t+σ2wt)F′t
Well now…we have shown that Equation 9 holds when we have marginally:
θt|Y1,...,Yt−1∼N(Gtˆθt−1,GtΣt−1G′t+σ2wt)
Which, as we discussed above, means that Equation 8 holds for et|Y1,...,Y1,...,Yt−1 and θt|Y1,...,Yt−1. What’s more, we solved for all the parameters (μy, μx, σ2x, σ2y and σxy) above! We can thus write:
Equation 10.
(θt|Y1,...,Yt−1et|Y1,...,Yt−1)∼N((Gtˆθt−10),(GtΣt−1G′t+σ2wtFt(GtΣt−1G′t+σ2wt)Ft(GtΣt−1G′t+σ2wt)σ2vt+Ft(GtΣt−1G′t+σ2wt)F′t)).
And, at long last, we can use Equation 9 to write the posterior distribution of θt given the data, which is what we have been after all along:
Equation 11.
θt|Y1,...,Yt−1,et∼N(Gtˆθt−1+RtF′t(σ2v1+FtRtF′t)−1et, Rt−RtF′t(σ2vt+FtRtF−1t)−1FtRt)
Where we let Rt=GtΣt−1G′t+σ2wt to save space.
Wow! that was a roller coaster of algebra. Let’s take a step back and try to consider the intuition behind this posterior distribution. By conditioning on all of the data up to time t (remember, observing et is akin to observing the data Yt), we have a distribution for θt, or the unknown state we are trying to estimate. The parameters of this distribution are simply functions of Ft, Gt, σ2wt, σ2vt and Yt, all of which are known at time t, and ˆθt−1, which is our known estimate of θt−1.
We can actually try to see some intuition from the result. Consider the mean of the posterior distribution:
Gtˆθt−1+RtF′t(σ2v1+FtRtF′t)−1et
The first term Gtˆθt−1 is just the estimate from the previous period ˆθt−1 scaled by Gt, which is how we related θt and θt−1 all the way back in Equation 1. We then add some ‘correction’ RtF′t(σ2v1+FtRtF′t)−1 to our previous guess and scale that correction by et. Remember that et is the error in predicting Yt, so that if we do a good job of predicting Yt at time t−1, we have that et will be small and thus our ‘correction’ will be small as well. This should reinforce the concept that we make a prediction about the future unknown state and then adjust that prediction with a measurement that we observe.
6.3 Exploring the Kalman Filter
It can definitely be hard to maintain a sense of intuition throughout all of that tedious algebra (especially with the complicated-looking result). Let’s try to work on our understanding of Kalman Filters by working through a simple example in R. We can use the FKF
package (which stands for Fast Kalman Filter) to ‘run’ the filter, and then perform the calculations brute force from the ground up to see if we can match the output of the package.
Let’s use a simple dataset: the ‘Average Yearly Temperatures in New Haven,’ which is available as nhtemp
in R:
# print data
nhtemp
## Time Series:
## Start = 1912
## End = 1971
## Frequency = 1
## [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4 50.7 50.9
## [18] 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7 51.0 50.6 51.7 51.5
## [35] 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9 52.6 50.2 52.6 51.6 51.9 50.5
## [52] 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0
# visualize
<- data.table(x = seq(from = 1912, to = 1971, by = 1),
data y = as.numeric(nhtemp))
ggplot(data, aes(x = x, y = y)) +
geom_line() +
xlab("year") + xlab("temperature") +
ggtitle("Measured Yearly Average Temperature in New Haven") +
theme_bw()
So, this dataset simply gives what the average temp for the year was from 1912 to 1971. Let’s think of this phenomenon with a Kalman Filter paradigm: the unknown state we are trying to observe is the true average yearly temperature in New Haven. This dataset provides us with our measurements of the temperature; since these measurements may be noisy, we will use the Kalman Filter to try to enhance our estimate.
We can set up our simple Kalman Filter model that we established in Equations 1 and 2:
θt=θt−1+vt
Yt=θt+wt
Where, again, vt∼N(0,σ2vt) and wt∼N(0,σ2wt), where each of the vt and wt terms are independent of each other through time (and vt is independent of wt). We will refer to these as the transition and measurement equations, respectively (transition because we are transitioning from θt−1 to θt, and measurement because we are measuring the data Yt).
Notice here that, for simplicity, we have Ft=Gt=1 (we won’t have to worry about those constants!). Again, we can think of θt as the true yearly temperature in year t, which moves through time as we add the random noise vt at each period. We can think of Yt as the measurement of the yearly temperature that is reported in our dataset in each period (again, note that the measurement is noisy: we have the true value θt plus some noise wt; again, the measurements are not totally accurate!).
Let’s now turn to the FKF
package (specifically, the fkf
function) to try to run a Kalman filter. Let’s start by matching up our variables in R.
<- nhtemp
y <- y[1]
a0 <- matrix(1) P0
Naturally, y
, our data, is coming from nhtemp
. Following the fkf
documentation (which you can read by typing ?fkf
), we let a0
be our initial guess for θ; a natural choice is just the temperature in the first period y[1]
. We let P0
be our best guess for the variance of θ; we use 1 here, which is close to the sample variance (var(y)
gives 1.6) and is a nice round number.
We now define our next set of parameters:
<- ct <- matrix(0) dt
According to the documentation of the fkf
function, these parameters give the ‘intercepts’ of the transition and measurement equations, respectively. The idea here is that we can extend our Kalman equations to add a constant that varies with time (i.e., θt=θt−1+dt+vt, where dt is a time-varying constant) in more complicated cases. For example, if we were working with monthly average temperatures, we might want to add a time-varying constant to adjust for seasonality; however, we don’t have this sort of construction in the problem at hand (temperatures are yearly), so we simply set these variables to 0.
Next we set our Ft and Gt values:
<- Tt <- matrix(1) Zt
These are labeled in the fkf
documentation as the ‘factors’ of the transition and measurement equations, which are Ft and Gt in our parlance. We defined these to be 1 earlier on, so we simply set them to 1. here.
Finally, we have to set parameters HHt
and GGt
, which give the variance of the transition and measurement equations, or σ2vt and σ2wt in our parlance. If we peek at the documentation in the fkf
function, we see that these variances are estimated by maximizing the log-likelihood of the Kalman Filter (search for fit.fkf
). We can borrow this approach here:
# Estimate parameters:
<- optim(c(HHt = var(y, na.rm = TRUE) * .5,
fit.fkf GGt = var(y, na.rm = TRUE) * .5),
fn = function(par, ...)
-fkf(HHt = matrix(par[1]), GGt = matrix(par[2]), ...)$logLik,
yt = rbind(y), a0 = a0, P0 = P0, dt = dt, ct = ct,
Zt = Zt, Tt = Tt)
# recover values
<- as.numeric(fit.fkf$par[1])
HHt <- as.numeric(fit.fkf$par[2])
GGt HHt; GGt
## [1] 0.05051545
## [1] 1.032562
This essentially just initializes the optimizer search for HHt
and GGt
at half of the variance of the data y
(this is the var(y, na.rm = TRUE) * .5
) and searches for the values of HHt
and GGt
that maximizes the log likelihood (using all of the other pre-defined parameters).
Finally, then, our parameters are set, and we can run our Kalman Filter:
<- fkf(a0, P0, dt, ct, Tt, Zt,
y_fkf HHt = matrix(HHt), GGt = matrix(GGt),
yt = rbind(y))
From the documentation of fkf
, we have that y_fkf$att
is the filtered state variables (our guess for θt, the true temperature, given the data Yt that we observe) and y_fkf$at
is the predicted state variables (our guess for θt+1 given the data Yt that we observe). We are interested here in the filtered variable, so we can plot it:
<- data.table(x = seq(from = 1912, to = 1971, by = 1),
data y = as.numeric(nhtemp),
y_kalman = as.numeric(y_fkf$att))
ggplot(data, aes(x = x, y = y)) +
geom_line() +
geom_line(data = data, aes(x = x, y = y_kalman), col = "blue") +
xlab("year") + xlab("temperature") +
ggtitle("Measured Yearly Average Temperature in New Haven (with Kalman Filter)") +
theme_bw()
Excellent! The blue line is the estimate for what the true temperature, while the black lines are the (potentially noisy) measurements that we observe.
It is great that we can use the fkf
package to run a Kalman Filter, but we should try brute forcing a solution from the ground up using our algebraic derivation. Recall our final result:
θt|Y1,...,Yt−1,et∼N(Gtˆθt−1+RtF′t(σ2v1+FtRtF′t)−1et, Rt−RtF′t(σ2vt+FtRtF−1t)−1FtRt)
Where Rt=GtΣt−1G′t+σ2wt.
We can try to use this ground-up derivation to reproduce the output from the fkf
function. Let’s set up paths for the state variable (theta
) and its variance (theta_var
), loop through and calculate the estimates. Note that we can employ the parameters a0
, p0
, Zt
, Tt
, HHt
, and GGt
that we used in the fkf
output (we will even convert them to the notation of Equation 11 so it’s easier to match with our algebraic solution).
# initialize
<- theta_var <- rep(NA, length(y) + 1)
theta
# set our initial guess
1] <- a0
theta[1] <- P0
theta_var[
# define parameters (notation consistent with Equation 11).
<- sqrt(HHt)
sigma_w <- sqrt(GGt)
sigma_v <- Tt
G_t <- Zt
F_t
# iterate and make estimates
for (i in 1:length(y)) {
# Equation 6.
# use previous theta value for theta_hat and calculate e_t
<- theta[i]
theta_hat <- y[i] - theta_hat * G_t * F_t
e_t
# calculate R_t
<- G_t * theta_var[i] * G_t + sigma_w ^ 2
R_t
# generate estimates from Equation 11
+ 1] <- G_t * theta_hat + R_t * F_t * (sigma_v ^ 2 + F_t ^ 2 * R_t) ^ (-1) * e_t
theta[i + 1] <- R_t - R_t * F_t * (sigma_v ^ 2 + F_t ^ 2 * R_t) ^ (-1) * F_t * R_t
theta_var[i
}
# adjust by one
<- theta[-1]
theta <- theta_var[-1] theta_var
We thus were able to run the Kalman Filter from the ground up, using our algebraic result. Does it match what we got from the fkf
package?
<- data.table(x = seq(from = 1912, to = 1971, by = 1),
data y = as.numeric(nhtemp),
y_kalman = as.numeric(y_fkf$att),
y_algebra = theta)
ggplot(data, aes(x = x, y = y)) +
geom_line() +
geom_line(data = data, aes(x = x, y = y_kalman), col = "blue", alpha = 1 / 2) +
geom_line(data = data, aes(x = x, y = y_kalman), col = "red", alpha = 1 / 2) +
xlab("year") + xlab("temperature") +
ggtitle("Measured Yearly Average Temperature in New Haven (with Kalman Filter)") +
theme_bw()
Excellent! Our output is the exact same as the fkf
output (the red and blue lines are overlaid with alpha = 1/2
, which is why we just see a purple line). This validates all of that intense algebra that we did; it’s easy to code up the result and show that it matches what’s going on behind the scenes in the fkf
package!