Chapter 4 Forward Filtering Backward Sampling - Shared Variance
This chapter discusses the Dynamic Linear Model with a scale factor for the variance shared across time and its derivations at each step. The approach taken in this chapter is borrowed from West and Harrison (1997), with some details derived from Petris et al (2009). The solution we take to estimate the parameters of this model is utilized via Forward Filtering Backward Sampling.
For full generality and to maintain a multivariate normal system in both the data and parameter matrices, we assume all Yt∈Rn, βt∈Rp, and t∈{1,…,T} for some integer T.
4.1 Background
The model we are concerned with studying is a class of time-varying models called the Dynamic Linear Model. The setup for the equation follows:
Yt|βt,σ2∼N(FTtβt,σ2Vt)βt|βt−1,σ2∼N(Gtβt−1,σ2Wt)σ−2∼Γ(at−1,bt−1)βt−1|σ2∼N(mt−1,σ2Ct−1)
Alternatively, using Normal-Inverse Gamma notation, where, if σ−2∼Γ(at−1,bt−1), σ2∼IG(at−1,bt−1), where IG denotes an inverse Gamma distribution, we may write the above set of equations as the following: Yt,σ2|βt∼NIG(FTtβt,Vt,at−1,bt−1)βt,σ2|βt−1∼NIG(Gtβt−1,Wt,at−1,bt−1)βt−1,σ2∼NIG(mt−1,Ct−1,at−1,bt−1)
The task is to acquire estimates for β0,…,T and σ2. This task may be divided into the forward filter and backwards sampling steps (collectively referred to as the Forward Filter-Backwards Sampling (FFBS) algorithm): The forward filter to acquire sequential estimates, and the backwards sampling step to retroactively “smooth” our initial estimates given estimates at the last time stamp. We are given a set of observations Yt,j, and known parameters Ft, Gt, Vt, Wt, and nt−1, although Frankenburg and Banerjee also apply FFBS to cases where Ft and Gt are not pre-specified.
4.2 Derivation of the Forward Filter
We proceed for some arbitrary t:
βt=Gtβt−1+ωt,ωt∼N(0,σ2Wt)βt|σ2∼N(Gtmt−1,σ2(GtCt−1GTt+Wt))
Now, let m∗t=Gtmt−1 and Rt=GtCt−1GTt+Wt. We then have:
Yt=FTtβt+νt,νt∼N(0,σ2Vt)Yt|σ2∼N(FTtm∗t,σ2(FTtRtFt+Vt))
Since σ2∼IG(at−1,bt−1), we marginalize it out of Yt|σ2 to get
Yt∼T2at−1(FTtm∗t,bt−1at−1(FTtRtFt+Vt))
We now have the apparatus needed to compute the sequential posterior βt|Yt and σ2|Yt:
4.2.1 Deriving βt|Yt
p(βt|Yt,σ2)∝p(βt,Yt|σ2)∝p(Yt|βt,σ2)p(βt|σ2)∝σ−nexp(−12σ2(yt−FTtβt)TV−1t(yt−FTtβt))σ−pexp(−12σ2(βt−m∗t)TR−1t(βt−m∗t))∝σ−(n+p)exp(−12σ2[(yt−FTtβt)TV−1t(yt−FTtβt)+(βt−m∗t)TR−1t(βt−m∗t)])
Note next that [Ytβt]|σ2∼N([FTtm∗tm∗t],σ2[FTtRtFt+VtFTtRtRtFtRt])
with the cross-terms Cov(Yt,βt)=Cov(FTtβt+νt,βt)=FTtCov(βt,βt)=FTtRt.
Since, for the following block-normal system
[x1x2]∼N([μ1μ2],[Σ11Σ12Σ21Σ22])
we have
x2|x1∼N(μ2+Σ21Σ−111(x1−μ1),Σ22−Σ21Σ−111Σ12)
(The derivation of the density of x2|x1 can be found in the Appendix.)
We arrive at,
βt|σ2,Yt∼N(m∗t+RtFt(FTtRtFt+Vt)−1(Yt−FTtm∗t),Rt−RtFt(FTtRtFt+Vt)−1FTtRt)∼N(m∗t+RtFtQ−1t(Yt−FTtm∗t),Rt−RtFtQ−1tFTtRt)
where Qt=FTtRtFt+Vt.
(Note that Petris’s expression for the variance suffers from a typo; to see this, simply take their ˜CTt.)
4.2.2 Deriving σ2|Yt
We next deduce the density of σ2|Yt. Note before we begin that since Yt∼T2at−1(FTtm∗t,Qt)=∫NIGYt(FTtm∗t,Qt,at−1,bt−1)dσ2, we can write Yt|σ2∼N(Ftm∗t,σ2Qt). Hence:
p(σ2|Yt)∝p(Yt|σ2)p(σ2)∝σ−nexp(−12σ2(yt−FTtm∗t)TQ−1t(yt−FTtm∗t))σ−2(at−1+1)exp(−bt−1σ−2)∝σ−2(at−1+n2+1)exp(−σ−2[12(yt−FTtm∗t)TQ−1t(yt−FTtm∗t)+bt−1])
We conclude that σ−2|Yt∼Γ(at,bt), where at=at−1+n2 and bt=bt−1+12(yt−FTtm∗t)TQ−1t(yt−FTtm∗t).
This gives us the set of updating equations according to Petris Proposition 4.1.
4.2.3 Commentary
Note that we have derived the forward filtering step for the set of equations for time t given the parameters for the distributions at time t−1. Hence the equation’s setup is Markovian, i.e. the state of this set of equations only depends on that of the preceding time point. Nevertheless, applications where the forward filter’s equations propagate from an initial time point t=0 are written so that the dependence of the parameters’ values βt and σ2 on the data up to time t−1 or time t are made explicit. Specifically, letting Dt={Yτ}τ=1,…,t, we may write the set of equations in our setup as:
Yt,σ2|βt,Dt−1∼NIG(FTtβt,Vt,at−1,bt−1)βt,σ2|βt−1,Dt−1∼NIG(Gtβt−1,Wt,at−1,bt−1)βt−1,σ2|Dt−1∼NIG(mt−1,Ct−1,at−1,bt−1)
and the sequential posteriors we have derived, βt|Yt and σ2|Yt, as βt|Dt and σ2|Dt respectively.
4.3 Derivation of the Backwards Sampling
Now that we have the parameters {θt,ϕ|Dt}t=1,…,T, we would like to work backwards and derive {θt,ϕ|θt+1,DT}t=1,…,T−1 to smooth our initial variable estimates:
p(θt|θ(t+1):T,σ2,DT)=p(θt|θt+1,σ2,Dt)=p(θt|θt+1,σ2,Dt)=p(θt+1|θt,Dt)p(θt|Dt)p(θt+1|Dt)∝p(θt+1|θt,Dt)p(θt|Dt)∝exp(−12σ2[(θt+1−Gt+1θt)TW−1t+1(θt+1−Gt+1θt)+(θt−mt)TC−1t(θt−mt)])∝exp(−12σ2[θTt+1W−1t+1θt+1−2θTt+1W−1t+1Gt+1θt+θTtGTt+1W−1t+1Gt+1θt+θTtC−1tθt−2mTtC−1tθt+mTtC−1tmt])∝exp(−12σ2[θTt(GTt+1W−1t+1Gt+1+C−1t)θt−2(C−1tmt+GTt+1W−1t+1θt+1)Tθt])θt|θt+1,σ2,DT∼N((GTt+1W−1t+1Gt+1+C−1t)−1(C−1tmt+GTt+1W−1t+1θt+1),σ−2(GTt+1W−1t+1Gt+1+C−1t)−1)∼N(mt−CtGTt+1(Wt+1+Gt+1CtGTt+1)−1Gt+1mt+CtGTt+1W−1t+1θt+1−CtGTt+1(Wt+1+Gt+1CtGTt+1)−1Gt+1CtGTt+1W−1t+1θt+1,Ct−CtGTt+1(Wt+1+Gt+1CtGTt+1)−1Gt+1Ct)∼N(mt−CtGTt+1R−1t+1Gt+1mt+CtGTt+1W−1t+1θt+1−CtGTt+1R−1t+1Gt+1CtGTt+1W−1t+1θt+1,Ct−CtGTt+1R−1t+1Gt+1Ct)
Notice that CtGTt+1R−1t+1Gt+1CtGTt+1W−1t+1θt+1=CtGTt+1R−1t+1(Gt+1CtGTt+1+Wt+1−Wt+1)W−1t+1θt+1=CtGTt+1R−1t+1(Rt+1−Wt+1)W−1t+1θt+1=CtGTt+1W−1t+1θt+1−CtGTt+1R−1t+1θt+1
Hence, θt|θt+1,σ2,DT∼N(mt−CtGTt+1R−1t+1Gt+1mt+CtGTt+1W−1t+1θt+1−CtGTt+1W−1t+1θt+1+CtGTt+1R−1t+1θt+1,Ct−CtGTt+1R−1t+1Gt+1Ct)∼N(mt+CtGTt+1R−1t+1(θt+1−at+1),Ct−CtGTt+1R−1t+1Gt+1Ct)