Chapter 4 Time Series Regression Modeling

The basic setup for time series regression modeling is that we have an observed outcome or output $y_t$ and an input or exposure $x_t$ , with $t=1,\dots,n$ . Typically, we want to know how changes in the input $x_t$ are associated with changes in the output $y_t$ . For example, we may be managing a system like a car or a spacecraft, and we want to know how the system will respond when we apply a force or increase an input. In other settings, we may be monitoring the levels of a toxic pollutant in the environment and may be interseted in how increases or decreases in the pollutant levels will affect the health of the neighboring population.

Another concern with models of this nature is the lag structure of the relationship between $x_t$ and $y_t$ . That is, given a unit change in $x_t$ at time $t$ , what is the change the output/outcome for times $t, t+1, t+2, \dots$ ? With traditional regression models we generally think of relationships as being cross-sectional (comparing one person to the next) or as being concurrent in timing. However, with time series data and temporal associations, we can see if an effect is propagated across time. In some of the literature, these kinds of models are known as distributed lag models because the effect of a change in $x_t$ is “distributed” across multiple days in the future.

Consider one example: Today it is generally accepted that ambient air pollution is associated with a variety of outcomes, including mortality. In particular, numerous time series studies have shown strong evidence of an association between daily changes in pollution levels and daily mortality counts. However, a question remains as to the nature of the deaths that occur after a spike in pollution levels? One possibility is that the deaths that occur after a spike in levels otherwise would have occurred a few days later. If that is true, it would suggest that the deaths have only been “displaced” by a few days, rather than months or years. While the deaths are still real, one might hypothesize that the air pollution is primarily affecting an already frail population. One can explore this “mortality displacement” idea with distributed lag models. If the hypothesis were true, then the association between air pollution and mortality might be positive shortly after an event, but then might become negative soon after as the pool of susceptible individuals becomes depleted.

Confounding is another important issue to consider, as with all regression modeling. Of course, the nature of confounding for a time series model will involve confounders that vary over time. As such, it may be possible to use time itself as a stand-in or surrogate for potential confounders in the event that we do not have direct measurements of relevant confounders.