Chapter 4 Forward Filtering Backward Sampling - Shared Variance

This chapter discusses the Dynamic Linear Model with a scale factor for the variance shared across time and its derivations at each step. The approach taken in this chapter is borrowed from West and Harrison (1997), with some details derived from Petris et al (2009). The solution we take to estimate the parameters of this model is utilized via Forward Filtering Backward Sampling.

For full generality and to maintain a multivariate normal system in both the data and parameter matrices, we assume all YtRn, βtRp, and t{1,,T} for some integer T.

4.1 Background

The model we are concerned with studying is a class of time-varying models called the Dynamic Linear Model. The setup for the equation follows:

Yt|βt,σ2N(FTtβt,σ2Vt)βt|βt1,σ2N(Gtβt1,σ2Wt)σ2Γ(at1,bt1)βt1|σ2N(mt1,σ2Ct1)

Alternatively, using Normal-Inverse Gamma notation, where, if σ2Γ(at1,bt1), σ2IG(at1,bt1), where IG denotes an inverse Gamma distribution, we may write the above set of equations as the following: Yt,σ2|βtNIG(FTtβt,Vt,at1,bt1)βt,σ2|βt1NIG(Gtβt1,Wt,at1,bt1)βt1,σ2NIG(mt1,Ct1,at1,bt1)

The task is to acquire estimates for β0,,T and σ2. This task may be divided into the forward filter and backwards sampling steps (collectively referred to as the Forward Filter-Backwards Sampling (FFBS) algorithm): The forward filter to acquire sequential estimates, and the backwards sampling step to retroactively “smooth” our initial estimates given estimates at the last time stamp. We are given a set of observations Yt,j, and known parameters Ft, Gt, Vt, Wt, and nt1, although Frankenburg and Banerjee also apply FFBS to cases where Ft and Gt are not pre-specified.

4.2 Derivation of the Forward Filter

We proceed for some arbitrary t:

βt=Gtβt1+ωt,ωtN(0,σ2Wt)βt|σ2N(Gtmt1,σ2(GtCt1GTt+Wt))

Now, let mt=Gtmt1 and Rt=GtCt1GTt+Wt. We then have:

Yt=FTtβt+νt,νtN(0,σ2Vt)Yt|σ2N(FTtmt,σ2(FTtRtFt+Vt))

Since σ2IG(at1,bt1), we marginalize it out of Yt|σ2 to get

YtT2at1(FTtmt,bt1at1(FTtRtFt+Vt))

We now have the apparatus needed to compute the sequential posterior βt|Yt and σ2|Yt:

4.2.1 Deriving βt|Yt

p(βt|Yt,σ2)p(βt,Yt|σ2)p(Yt|βt,σ2)p(βt|σ2)σnexp(12σ2(ytFTtβt)TV1t(ytFTtβt))σpexp(12σ2(βtmt)TR1t(βtmt))σ(n+p)exp(12σ2[(ytFTtβt)TV1t(ytFTtβt)+(βtmt)TR1t(βtmt)])

Note next that [Ytβt]|σ2N([FTtmtmt],σ2[FTtRtFt+VtFTtRtRtFtRt])

with the cross-terms Cov(Yt,βt)=Cov(FTtβt+νt,βt)=FTtCov(βt,βt)=FTtRt.

Since, for the following block-normal system

[x1x2]N([μ1μ2],[Σ11Σ12Σ21Σ22])

we have

x2|x1N(μ2+Σ21Σ111(x1μ1),Σ22Σ21Σ111Σ12)

(The derivation of the density of x2|x1 can be found in the Appendix.)

We arrive at,

βt|σ2,YtN(mt+RtFt(FTtRtFt+Vt)1(YtFTtmt),RtRtFt(FTtRtFt+Vt)1FTtRt)N(mt+RtFtQ1t(YtFTtmt),RtRtFtQ1tFTtRt)

where Qt=FTtRtFt+Vt.

(Note that Petris’s expression for the variance suffers from a typo; to see this, simply take their ˜CTt.)

4.2.2 Deriving σ2|Yt

We next deduce the density of σ2|Yt. Note before we begin that since YtT2at1(FTtmt,Qt)=NIGYt(FTtmt,Qt,at1,bt1)dσ2, we can write Yt|σ2N(Ftmt,σ2Qt). Hence:

p(σ2|Yt)p(Yt|σ2)p(σ2)σnexp(12σ2(ytFTtmt)TQ1t(ytFTtmt))σ2(at1+1)exp(bt1σ2)σ2(at1+n2+1)exp(σ2[12(ytFTtmt)TQ1t(ytFTtmt)+bt1])

We conclude that σ2|YtΓ(at,bt), where at=at1+n2 and bt=bt1+12(ytFTtmt)TQ1t(ytFTtmt).

This gives us the set of updating equations according to Petris Proposition 4.1.

4.2.3 Commentary

Note that we have derived the forward filtering step for the set of equations for time t given the parameters for the distributions at time t1. Hence the equation’s setup is Markovian, i.e. the state of this set of equations only depends on that of the preceding time point. Nevertheless, applications where the forward filter’s equations propagate from an initial time point t=0 are written so that the dependence of the parameters’ values βt and σ2 on the data up to time t1 or time t are made explicit. Specifically, letting Dt={Yτ}τ=1,,t, we may write the set of equations in our setup as:

Yt,σ2|βt,Dt1NIG(FTtβt,Vt,at1,bt1)βt,σ2|βt1,Dt1NIG(Gtβt1,Wt,at1,bt1)βt1,σ2|Dt1NIG(mt1,Ct1,at1,bt1)

and the sequential posteriors we have derived, βt|Yt and σ2|Yt, as βt|Dt and σ2|Dt respectively.

4.3 Derivation of the Backwards Sampling

Now that we have the parameters {θt,ϕ|Dt}t=1,,T, we would like to work backwards and derive {θt,ϕ|θt+1,DT}t=1,,T1 to smooth our initial variable estimates:

p(θt|θ(t+1):T,σ2,DT)=p(θt|θt+1,σ2,Dt)=p(θt|θt+1,σ2,Dt)=p(θt+1|θt,Dt)p(θt|Dt)p(θt+1|Dt)p(θt+1|θt,Dt)p(θt|Dt)exp(12σ2[(θt+1Gt+1θt)TW1t+1(θt+1Gt+1θt)+(θtmt)TC1t(θtmt)])exp(12σ2[θTt+1W1t+1θt+12θTt+1W1t+1Gt+1θt+θTtGTt+1W1t+1Gt+1θt+θTtC1tθt2mTtC1tθt+mTtC1tmt])exp(12σ2[θTt(GTt+1W1t+1Gt+1+C1t)θt2(C1tmt+GTt+1W1t+1θt+1)Tθt])θt|θt+1,σ2,DTN((GTt+1W1t+1Gt+1+C1t)1(C1tmt+GTt+1W1t+1θt+1),σ2(GTt+1W1t+1Gt+1+C1t)1)N(mtCtGTt+1(Wt+1+Gt+1CtGTt+1)1Gt+1mt+CtGTt+1W1t+1θt+1CtGTt+1(Wt+1+Gt+1CtGTt+1)1Gt+1CtGTt+1W1t+1θt+1,CtCtGTt+1(Wt+1+Gt+1CtGTt+1)1Gt+1Ct)N(mtCtGTt+1R1t+1Gt+1mt+CtGTt+1W1t+1θt+1CtGTt+1R1t+1Gt+1CtGTt+1W1t+1θt+1,CtCtGTt+1R1t+1Gt+1Ct)

Notice that CtGTt+1R1t+1Gt+1CtGTt+1W1t+1θt+1=CtGTt+1R1t+1(Gt+1CtGTt+1+Wt+1Wt+1)W1t+1θt+1=CtGTt+1R1t+1(Rt+1Wt+1)W1t+1θt+1=CtGTt+1W1t+1θt+1CtGTt+1R1t+1θt+1

Hence, θt|θt+1,σ2,DTN(mtCtGTt+1R1t+1Gt+1mt+CtGTt+1W1t+1θt+1CtGTt+1W1t+1θt+1+CtGTt+1R1t+1θt+1,CtCtGTt+1R1t+1Gt+1Ct)N(mt+CtGTt+1R1t+1(θt+1at+1),CtCtGTt+1R1t+1Gt+1Ct)