4.4 Temporal Confounding

Let’s first consider a simple linear regression model,

yt=βxt+εt

where t=0,,n1 and εt is a Gaussian process with mean 0 and autocovariance σ2ρ(u). For now, without loss of generality, let’s assume that E[yt]=E[xt]=0 and that Var(xt)=1. The least squares estimate for β is

ˆβ=1nn1t=0ytxt. Strictly speaking, β quantifies the strength of the association between xt and yt at 0-lag. If t represents the time unit of a “day”, then β is the 0-day lag association.

Based on what we learned in the chapter on frequency and time scale analysis, we can decompose xt into its Fourier components,

xt=1nn1p=0zx(p)exp(2πipt/n) (assuming that n is even for now), where zx(p) is the complex Fourier coefficient associated with the frequency p and the series xt, i.e.

zx(p)=n1t=0xtexp(2πipt/n).

Plugging into the formula for the least squares estimate gives us an estimate of β as

ˆβ=1nn1t=0ytxt=1nn1t=0yt[1nn1p=0zx(p)exp(2πipt/n)]=1n2n1t=0n1p=0zx(p)ytexp(2πipt/n)=1n2n1p=0[zx(p)n1t=0ytexp(2πitp/n)]=1n2n1p=0zx(p)ˉzy(p) where ˉzy(p) is the complex conjugate of the Fourier coefficient associated with the series yt at frequency p.

The derivation above shows that the least squares estimate of the coefficient β can be written as a sum of the products of the Fourier coefficients between the yt and xt series over all of the frequencies. While it’s not advisable to compute ˆβ in this manner as it would require two separate FFTs (one for yt and one for xt), it does show the dependence on ˆβ on all of the time scales of variation in both yt and xt.

One of the primary advantages of doing regression of time series data is that we can decided for ourselves what time scales of variation we are in fact interested in. It is not necessary for us to focus our attention on all of the time scales of variation in the (as the naive least squares estimate would have us do) if some of the time scales to not have a meaningful interpretation. For example, if we are primarily interested in long-term trends in xt and yt, then we do not need to focus on the high-frequency components. Similarly, if we are interested in short-term fluctuations between xt and yt, then we can discard the low-frequency components of ˆβ.

We can think of the ˆβ as being a mixture of associations at different time scales, and we may focus our attention on only those associations/time scales that are of interest to us. If we let

ˆβp=zx(p)nˉzy(p)n

then we can write the least squares estimate as

ˆβLS=n1p=0ˆβp.

The point of showing this is that we rarely are focused on the association between xt and yt at all time scales at the same time. In any given analysis, we may be focused on short-term associations (e.g. “acute” effects) or we may be focused on long-term associations. But it’s rarer to want to mix all of those associations together. However, that’s exactly what the least squares estimate does.

4.4.1 Bias from Omitted Temporal Confounders

A well-known result in the theory of least squares estimators is one due to omitted variable bias. Suppose the true model for yt is

yt=βxt+γwt+εt

where zt is some other time series with mean 0 and variance 1. If we include xt but omit zt from our modeling and simply estimate β via least squares, we will get

ˆβLS=1nn1t=0ytxt=1nn1t=0(βxt+γwt+εt)xt=β^Var(xt)+γ1nn1t=0wtxt+1nn1t=0εtxtβ+γ1nn1t=0wtxt

So the bias in the least squares estimate of β is equal to γ1nn1t=0wtxt. However, notice that the quantity 1nn1t=0wtxt is essentially the least squares estimate of the coefficient in the simple linear regression model relating wt to xt, i.e. it is the emprical covariance between wt and xt. Of course, this implies that if the covariance between wt and xt is zero, then there is no bias in ˆβLS and there is no danger in omitting wt from the model for yt. But we can also write the bias as a sum of Fourier coefficients,

Bias(ˆβLS)=γ1n2n1p=0zw(p)ˉzx(p)

From this representation, we can see that one way to eliminate (or at least minimize) the bias from omitting the variable wt is to focus on time scales p where either zw(p)0. As long as there is no time scale where the Fourier coefficients for both xt and wt are large, the bias should be close to zero. Also, it’s worth noting that if γ=0 there is no bias, but for now we will assume that γ0.

4.4.2 Example: Confounding by Smoothly Varying Factors

For example, suppose we are looking at temperature and mortality and we are concerned about possible confounding by economic development. Urban economic development is correlated with the “urban heat island” effect as more roads and building can cause increased temperatures in urban environments. Furthermore, economic development can be related to mortality in a variety of ways.

One assumption we might be willing to make is that economic development largely occurs over broad time scales, on the order of months to years or even decades. As such, most of the variation in any variable that tracked economic development would be concentrated in the low frequencies. Therefore, if we focus our interest on variation temperature and mortality in the higher frequency ranges, where we might assume that the Fourier coefficients for economic development are close to 0 (but the coefficients for temperature and mortality are non-zero), then we may be less concerned about confounding from that particular factor.

As another example, suppose we are interested in studying air pollution and mortality and are worried about possible confounding by seasonally varying factors. Seasonal factors have a strong variation at 1 cycle per year (365 days) and so we might want to focus our interest in looking at variation in air pollution and mortality either at shorter time scales or at longer time scales.