7.14 Time-varying predictors

In all the examples we have looked at, the predictors did not vary over time. Each individual had only one value for each predictor. Since survival data is the result of follow-up over time, it is possible to have predictors that do vary over time. For example, in a study of time to heart attack, researchers could record various other conditions’ occurrence over time, such as hypertension or angina. In a study of juvenile recidivism, researchers could record how education or employment status change over time. Cox regression is able to handle such time-varying predictors (also known as “time-dependent covariates”).

The datasets we have analyzed so far contained one row per individual, including their event time, an indicator of whether the event occurred at that time or the time is censored, and time-invariant predictor values. A dataset with time-varying predictors will have multiple rows per individual, with different rows having different values for the time-varying predictors, reflecting how they change over time. Additionally, rather than having a single event time variable, each row will have two time variables indicating the beginning and end of the time interval represented by that row of data.

Example 7.8: The Opioid teaching dataset (see Appendix A.7) contains longitudinal information for 362 individuals who at baseline had used non-prescribed pharmaceutical opioids (NPPO, “pain pills”), but were not dependent on NPPOs and had never used heroin (Carlson et al. 2016). The dataset contains 1853 observations. Each row contains the time variables START and STOP which define the time interval (years from initiation of NPPO use) associated with that row. Time-invariant variables in the dataset are constant over all rows for the same individual, while time-varying variables can change between rows. Each (START, STOP] interval defines a period of time during which no variables changed. Two time-varying variables in the dataset are heroin use (heroin) (the event indicator variable) and lifetime opioid dependence (dep_lifetime) based on DSM-IV criteria (Forman et al. 2004; Hudziak et al. 1993). Other than heroin use, which is the event of interest, time-dependent variables were lagged to ensure that they temporally preceded the outcome.

Look at the rows of data for a few variables for one individual.

load("Data/opioid_rmph.rData")

opioid %>% 
  filter(RANDID == 10) %>%
  select(RANDID, wave, START, STOP, heroin, age_at_init, sex, dep_lifetime)

##   RANDID wave START STOP heroin age_at_init  sex dep_lifetime
## 1     10    1  3.78 4.26      0          19 Male            0
## 2     10    2  4.26 4.78      0          19 Male            1
## 3     10    3  4.78 5.29      0          19 Male            1
## 4     10    4  5.29 5.84      0          19 Male            1
## 5     10    5  5.84 6.27      0          19 Male            1
## 6     10    6  6.27 6.79      1          19 Male            1

This individual is male and started using NPPOs at age 19 years. These two variables are time-invariant and so have the same value in every row. His first interview took place 4.26 years after NPPO initiation and asked about his experiences and behaviors going back 6 months. Thus, the time interval for his first row is (START, STOP] = (3.78, 4.26]. He first reported using heroin at his 6th interview. At baseline (wave = 0, but remember time-varying variables were lagged so wave = 1 values in the dataset actually are the values from wave = 0), he did not meet the criteria for lifetime opioid dependence, but he did at the next interview. Thus, the associated time-varying variable (dep_lifetime) changes from the first to second row.

We will work with datasets that are already in this format. If, in your future work, you have a dataset in a different format, see T. Therneau, Crowson, and Atkinson (2023) for guidance on transforming it into the correct format. Given data set up in this manner, we can fit a Cox regression model with a slight modification to the coxph() syntax we have been using. Instead of Surv(TIME, EVENT), use Surv(START, STOP, EVENT). In this example, the START and STOP variables are named START and STOP, but if they have different names in a future dataset you are working with then use those names instead.

cox.ex7.8 <- coxph(Surv(START, STOP, heroin) ~
                     age_at_init + sex + dep_lifetime,
                   data = opioid)

summary(cox.ex7.8)

## Call:
## coxph(formula = Surv(START, STOP, heroin) ~ age_at_init + sex + 
##     dep_lifetime, data = opioid)
## 
##   n= 1853, number of events= 27 
## 
##                coef exp(coef) se(coef)     z Pr(>|z|)   
## age_at_init  -0.393     0.675    0.135 -2.91   0.0036 **
## sexMale       0.151     1.164    0.392  0.39   0.6996   
## dep_lifetime  1.058     2.881    0.399  2.65   0.0080 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##              exp(coef) exp(-coef) lower .95 upper .95
## age_at_init      0.675      1.482     0.518     0.879
## sexMale          1.164      0.859     0.539     2.511
## dep_lifetime     2.881      0.347     1.318     6.298
## 
## Concordance= 0.716  (se = 0.042 )
## Likelihood ratio test= 15.2  on 3 df,   p=0.002
## Wald test            = 14.5  on 3 df,   p=0.002
## Score (logrank) test = 15.4  on 3 df,   p=0.001

cbind("AHR"      = exp(summary(cox.ex7.8)$coef[, "coef"]),
                   exp(confint(cox.ex7.8)),
      "p-value"  = summary(cox.ex7.8)$coef[, "Pr(>|z|)"])

##                 AHR  2.5 % 97.5 %  p-value
## age_at_init  0.6749 0.5180 0.8794 0.003597
## sexMale      1.1635 0.5391 2.5111 0.699560
## dep_lifetime 2.8807 1.3176 6.2981 0.008022

car::Anova(cox.ex7.8, type = 3, test.statistic = "Wald")

## Analysis of Deviance Table (Type III tests)
## 
## Response: Surv(START, STOP, heroin)
##              Df Chisq Pr(>Chisq)   
## age_at_init   1  8.48     0.0036 **
## sex           1  0.15     0.6996   
## dep_lifetime  1  7.03     0.0080 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Whether a predictor is time-varying or time-invariant, its HR can be interpreted as a comparison of the hazard between groups of individuals with different values of that predictor. Thus, in this example, we could conclude that after adjusting for age at NPPO initiation and sex, those with lifetime opioid dependence have 2.88 times the hazard of using heroin as those who do not (AHR = 2.88; 95% CI = 1.32, 6.30; p = .008). The HR for a time-varying predictor, however, can also be interpreted as the effect of within-individual change on the hazard. If an individual without opioid dependence transitions to dependence, their hazard of transitioning to heroin is multiplied by 2.88.

NOTES:

If an observation has the same start and end time, coxph will drop it from the analysis since there is no time at risk. However, this may not be appropriate. For example, suppose times are recorded as whole number days. An interval could start on a certain day and the event could have happened that same day. So there is time at risk, it is just less than a whole day. In this case, an approximate solution is to add a fractional number to the stop time for observations with START = STOP, as suggested in T. Therneau, Crowson, and Atkinson (2023). For example, if START and STOP are both day 10, change STOP to 10.5 (making the assumption that the person was at risk in this interval for 12 hours). Make sure to also increase the START time of the next time interval for this person by the same amount to avoid having overlapping intervals.
As mentioned previously, the event time distribution in this dataset is left-truncated – those who transitioned to heroin before the study started were not eligible for inclusion in the study. Fortunately, the Surv(START, STOP, EVENT) syntax automatically handles the left-truncation when, as in this dataset, the start time for the first interval for each individual is the time from the time origin (e.g., initiation of pain pill use) to the start of the study. Having the start time of each individual’s first interval be 0 would lead to bias, as it would assume that everyone was under observation since their time origin and that those in the study were sampled from those at risk since the time origin. In fact, participants were only observed after the study started and those who experienced the event before then were excluded.

References

Carlson, Robert G., Ramzi W. Nahhas, Silvia S. Martins, and Raminta Daniulaityte. 2016. “Predictors of Transition to Heroin Use Among Initially Non-Opioid Dependent Illicit Pharmaceutical Opioid Users: A Natural History Study.” Drug and Alcohol Dependence 160: 127–34. https://doi.org/10.1016/j.drugalcdep.2015.12.026.

Forman, Robert F, Dace Svikis, Ivan D Montoya, and Jack Blaine. 2004. “Selection of a Substance Use Disorder Diagnostic Instrument by the National Drug Abuse Treatment Clinical Trials Network.” Journal of Substance Abuse Treatment 27 (1): 1–8. https://doi.org/10.1016/j.jsat.2004.03.012.

Hudziak, James J., John E. Helzer, Martin W. Wetzel, Keith B. Kessel, Barbara McGee, Aleksandar Janca, and Thomas Przybeck. 1993. “The Use of the DSM-III-R Checklist for Initial Diagnostic Assessments.” Comprehensive Psychiatry 34 (6): 375–83. https://doi.org/10.1016/0010-440X(93)90061-8.

———. 2023. “Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model.” cran.r-project.org/web/packages/survival/vignettes/timedep.pdf.