5.3 Deriving the One-dimensional Case
There are a variety of ways to drive the Kalman filtering equations but for statisticians it’s probably easiest to think of everything in terms of Normal distributions. Keep in mind that we generally won’t believe that the data are distributed as Normal, but rather we can think of the Normal distribution as a working model. We will continute with the local level model described above which has
xt=θxt−1+wtyt=xt+vt where wt∼N(0,τ2) and vt∼N(0,σ2).
Let’s start at t=1 where we will observe y1. Assume we have our initial state x0∼N(x00,P00). First, we want to get the marginal distribution of x1, i.e. p(x1). Because there is no y0 we cannot condition on any observed information yet. We can compute p(x1) as p(x1)=∫p(x1∣x0)p(x0)dx0=∫N(θx0,τ2)×N(x00,P00)dx0=N(θx00,θ2P00+τ2)=N(x01,P01). Note that we have defined x01Δ=θx00 and P01Δ=θ2P00+τ2. We can think of x01 as the best prediction we can make based on our knowledge of the system and no data.
Given the new observation y1 we want to use this information to estimate x1. For that we need the conditional distribution p(x1∣y1), which is called the filter density. We can figure that out with Bayes’ rule: p(x1∣y1)∝p(y1∣x1)p(x1)
We know from the observation equation that p(y1∣x1)=N(x1,σ2) and we just computed p(x1) above. Therefore, using the basic properties of the Normal distribution, we have p(x1∣y1)∝p(y1∣x1)p(x1)=φ(y1∣x1,σ2)×φ(x1∣x01,P01)=N(x01+K1(y1−x01),(1−K1)P01) where K1=P01P01+σ2, is the Kalman gain coefficient. Then for t=1 we have our new estimates x11=E[x1∣y1]=x01+K1(y1−x01) and P11=Var(x1∣y1)=(1−K1)P01. So the filter density is p(x1∣y1)=N(x11,P11).
Let’s now iterate through this process for t=2 where we will now have a new observation y2. We want to compute the new filter density for x2, p(x2∣y1,y2)∝p(y2∣x2)p(x2∣y1). Implicit in the statement above is that y2 does not depend on y1 conditional on the value of x2. The new filter density p(x2∣y1,y2) is a function of the history of the observed data and is a product of the observation density p(y2∣x2) and the forcast density p(x2∣y1).
In this case, the observation density is simply N(x2,σ2). The forecast density can be derived by augmenting p(x2∣y1) with the previous state value x1, p(x2∣y1)=∫p(x2,x1∣y1)dx1∝∫p(x2∣x1)p(x1∣y1)dx1. Inside the integral, we have the product of the state equation density and the filter density for x1∣y1, which we just computed for t=1. The state equation density is N(θx1,τ2) and the filter density is N(x11,P11). Integrating these out, we get p(x2∣y1)∝∫φ(x2∣θx1,τ2)φ(x1∣x11,P11)dx1=φ(x2∣θx11,θ2P11+τ2)=φ(x2∣x12,P12)
Putting the forecast density we just computed with the observation density, we get p(x2∣y1,y2)∝φ(y2∣x2,σ2)φ(x2∣x12,P12)=N(x12+K2(y2−x12),(1−K2)P12) where K2=P12P12+σ2 is the new Kalman gain coefficient. If we define x22=x12+K2(y2−x12) and P22=(1−K2)P12 then we have that p(x2∣y1,y2)=N(x22,P22).
Shall we do t=3 just for fun? Given a new observation y3, we want the new filter density p(x3∣y1,y2,y3)∝p(y3∣x3)p(x3∣y1,y2). Using the same ideas as before, we know the observation density p(y3∣x3) and the forecast density is p(x3∣y1,y2)=∫p(x3,x2∣y1,y2)dx2∝∫p(x3∣x2)p(x2∣y1,y2)dx2=∫φ(x3∣θx2,τ2)φ(x2∣x22,P22)dx2=φ(x3∣θx22,θ2P22+τ2)=φ(x3∣x23,P23) Now the new filter density is p(x3∣y1,y2,y3)∝p(y3∣x3)p(x3∣y1,y2)=φ(y3∣x3,σ2)φ(x3∣x23,P23)=N(x23+K3(y3−x23),(1−K3)P23) where K3=P23/(P23+σ2).
To summarize, for each t=1,2,3,… the estimate of xt is the mean of the filter density p(xt∣y1,…,yt) and the filter density is a product of the observation density and the forecast density, i.e. p(xt∣y1,…,yt)∝p(yt∣xt)p(xt∣y1,…,yt−1). The benefit of the Kalman filtering algorithm is that we compute each estimate recursively so there is no need to “save” information from previous iterations. Each iteration has built into it all of the information from previous iterations.