Studies of Household COVID-19 Infections in the Respiratory Virus Transmission Network

Statistical consulting practicum & collaboration with VU Department of Health Policy

1 Summary of Project Requests

Study Lead(s): Carlos Grijalva, H. Keipp Talbot, Yuwei Zhu

Database: Respiratory Virus Transmission Network (RVTN)

Biostatisticians: Julia Whitman, Dandan Liu (supervisor)

2 Study Goals

The purpose of this statistical consulting practicum was to work with the Vanderbilt Health Policy Department to develop statistical analysis plans (SAPs) and conduct initial analyses for key research questions using the Respiratory Virus Transmission Network (RVTN) data. I will highlight certain elements of my analysis for two of these questions in my presentation.

Project 1: Identify factors associated with COVID-19 symptom severity

Project 2: Characterize association between index case vaccination status on time to recovery from COVID-19

2.1 Summary of Findings

  • Apparent effect of female sex and Hispanic ethnicity on lower symptom severity during the study

  • Immunocompromised individuals were more likely to experience severe symptoms

  • Participants who were asymptomatic at baseline recovered at higher rates than those with acute respiratory infections at baseline.

  • Counterintuitively, participants reporting non-respiratory comorbidities recovered at higher rates than individuals reporting no comorbidities. Further, a lower rate of recovery was observed in the most vaccinated patients (5+ SARS-CoV-2 vaccines)

3 Background

3.1 Respiratory Virus Transmission Network

  • CDC and FDA-sponsored, multi-center study on factors influencing household transmission of influenza and SARS-CoV-2 infections

3.2 Study Design

4 Project1 : Factors Associated with Daily COVID-19 Symptom Severity

  • Prompt. Identify factors associated with asymptomatic, paucisymptomatic, and symptomatic COVID-19 infections

  • Hypothesis. The intensity of SARS-CoV-2 symptoms is positively correlated with “infection-promoting variables” in a participant’s person or household, primarily vaccination status

4.1 Group decisions

Based on our collaborators’ goals for the study, our group came to a consensus about certain elements that should be common to each individual’s analysis. These included:

  • Primary outcome and covariates of interest

  • Definitions of disease severity based on patient self-reporting over 10 day follow up

  • Eligibility criteria

  • Broadly, the analysis should account for repeated measures but we were free to implement our model of choice

4.2 Variables

4.2.1 Outcome

  • Daily SARS-CoV-2 infection severity, calculated using the aggregate daily number of symptoms:

    • Asymptomatic: no symptoms reported on any day of follow-up

    • Paucisymptomatic: no more than one symptom on each day

    • ARI: two or more symptoms reported on any single day

## Deriving infection severity groups

# sum across daily symptom columns for all participants
dt.combine2$sumd1 <- rowSums(dt.combine2[ ,c(47:61)], na.rm = TRUE)
dt.combine2$sumd2 <- rowSums(dt.combine2[ ,c(63:77)], na.rm = TRUE)
dt.combine2$sumd3 <- rowSums(dt.combine2[ ,c(79:93)], na.rm = TRUE)
dt.combine2$sumd4 <- rowSums(dt.combine2[ ,c(95:109)], na.rm = TRUE)
dt.combine2$sumd5 <- rowSums(dt.combine2[ ,c(111:125)], na.rm = TRUE)
dt.combine2$sumd6 <- rowSums(dt.combine2[ ,c(127:141)], na.rm = TRUE)
dt.combine2$sumd7 <- rowSums(dt.combine2[ ,c(143:157)], na.rm = TRUE)
dt.combine2$sumd8 <- rowSums(dt.combine2[ ,c(159:173)], na.rm = TRUE)
dt.combine2$sumd9 <- rowSums(dt.combine2[ ,c(175:189)], na.rm = TRUE)
dt.combine2$sumd10 <- rowSums(dt.combine2[ ,c(191:205)], na.rm = TRUE)

# count total number of symptoms across days 
dt.combine2$daysums <- rowSums(dt.combine2[ ,c(207:216)], na.rm = TRUE)

# assign symptom group to patients based on number of symptoms per day
dt.combine2 <- dt.combine2 %>%
  mutate(sxstat = case_when(daysums == 0 ~ 'asymptomatic',
                            sumd1 >1 | sumd2 >1 | sumd3 >1 | sumd4 >1 | sumd5 >1 | sumd6 >1 | sumd7 >1 | sumd8 >1 | sumd9 >1 | sumd10 >1 ~ 'ARI',
                           daysums > 10 ~ 'ARI',
                           sumd1 <2 & sumd2 <2 & sumd3 <2 & sumd4 <2 & sumd5 <2 & sumd6 <2 & sumd7 <2 & sumd8 <2 & sumd9 <2 & sumd10 <2 ~ 'paucisymptomatic'))

4.3 Eligibility Criteria

  • Household members with SARS-CoV-2 confirmed on laboratory test who enrolled in the RVTN study

4.4 Statistical Analysis

  • Ordinal transition model used to estimate probability of individuals’ symptoms improving, worsening or staying the same on any given day

4.5 Exploratory Analysis

## Random sample of patient trajectories

ssamp <- sample(unique(dta.long$id), 30, FALSE)
dr <- subset(dta.long, id %in% ssamp)
dr$id <- as.integer(as.factor(dr$id))
dr$y <- factor(dr$y, levels=rev(levels(dr$y)))
dr$day    <- dr$day - 1

# sx chart, ordered by sx at last f/u
p1_mec <- multEventChart(y ~ day + id, palette = 'Spectral', data=dr,
               sortbylast = TRUE) +
  theme_classic() +
  theme(legend.position='bottom')

5 4 25 22 18 17 15 11 30 21 16 9 8 7 6 3 29 28 27 26 24 23 20 2 19 14 13 12 10 1 0 1 2 3 4 5 6 7 8 9 Day asymptomatic paucisymptomatic ARI

Figure 1. Trajectories for a random sample of 30 study participants

The figure shows an individual’s symptom trajectory over ten days for a random selection of 30 patients. Notably, it suggests that:

  • the majority of patients are asymptomatic at the end of follow up.
  • patients spend the majority of days in one of the three states rather than transitioning equally between them.
  • there is no absorbing state, meaning that patients freely transitioned between states.
## Category proportions over study by vaccine group

p1_proppo <- propsPO(y ~ tx + day, data=dta.long, nrow=1) +
  theme(legend.position='bottom', axis.text.x=element_text(angle=90, hjust=1))

For clarity, we show how the mix of symptom outcomes changes over time, also stratified by vaccination status.

1 2 3 4 5 6 7 8 9 10 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 0.00 0.25 0.50 0.75 1.00 tx Proportion y ARI paucisymptomatic asymptomatic

Figure 2. A plot of observed state occupancy probabilities for each symptom state over time, stratified by vaccination status (1= Unvaccinated, 2 = Vaccinated)

The figure shows that the vast majority of participants were categorized as either asymptomatic or ARI, while few were paucisymptomatic. The proportion of ARI cases also decreases while asymptomatic cases increase as the study progresses.

## Successive state transitions

p1_proptrans <- propsTrans(y ~ day + id, data=dta.long, maxsize=6) +
  theme(legend.position='bottom', axis.text.x=element_text(angle=90, hjust=1))

Day 7 ➔ 8 Day 8 ➔ 9 Day 9 ➔ 10 Day 4 ➔ 5 Day 5 ➔ 6 Day 6 ➔ 7 Day 1 ➔ 2 Day 2 ➔ 3 Day 3 ➔ 4 asymptomatic paucisymptomatic ARI asymptomatic paucisymptomatic ARI asymptomatic paucisymptomatic ARI asymptomatic paucisymptomatic ARI asymptomatic paucisymptomatic ARI asymptomatic paucisymptomatic ARI Previous State Current State Proportion 0.25 0.50 0.75

Figure 3. Proportion of each transition type over time

Here we observe the proportion of participants who transitioned between the three states on each consecutive day. There are some notable patterns, including:

  • Larger proportions along the diagonals suggest participants are more likely to remain in their current state than transition states on the next day.
  • As the study progresses, participants are progressively less likely to transition to a higher state. For example, the proportion who transition from asymptomatic to ARI between days 1 and 2 is appreciably higher than between days 9 and 10. This implies people are recovering towards the end of the study period.

4.6 Results

## Model specification
mod.1 <- orm(y ~ yprev + yprev * tx + rcs(day, 4) + tx + age + sex + smoking + asthma + other_lung + immune + immune_meds + race.eth, data = dta.long, maxit=30, x=TRUE, y=TRUE) 

## Covariate plot
plot <- sjPlot::plot_model(mod.1) + scale_x_discrete(labels=c("ARI*trt (prior)","Pauci*trt (prior)", "Race/eth: white", "Race/eth: unk", "Race/eth: NH or PI", "Race/eth: Hispanic", "Race/eth: Black/Af. Am.", "Immune meds: unk", "Immune meds: yes","Immune: unk", "Immune: yes", "Lung dis: unk", "Lung dis: unk", "Asthma: unk", "Asthma: yes", "Smok: unk", "Smok: yes", "Female", "Age", "Day Spl. 1", "Day Spl. 2", "Day", "Vaccinated", "Prev. ARI", "Prev. Pauci.", "Prev. Asymp."))
plot + theme_sjplot2() 
p1_covs <- plot + theme_sjplot2()

ARI*trt (prior) Pauci*trt (prior) Race/eth: white Race/eth: unk Race/eth: NH or PI Race/eth: Hispanic Race/eth: Black/Af. Am. Immune meds: unk Immune meds: yes Immune: unk Immune: yes Lung dis: unk Lung dis: unk Asthma: unk Asthma: yes Smok: unk Smok: yes Female Age Day Spl. 1 Day Spl. 2 Day Vaccinated Prev. ARI Prev. Pauci. Prev. Asymp. 0.01 0.1 1 10 100 1000 Odds Ratios y

  • Our ordinal transition model demonstrated an apparent effect of female sex and Hispanic ethnicity on transition probability to a lower symptom status (i.e., from ARI to paucisymptomatic/asymptomatic or paucisymptomatic to asymptomatic).

  • Conversely, immunocompromised individuals and individuals with unknown immunocompromised status were more likely to transition to more severe symptom states.

  • The proportional odds assumption did not appear to be violated with respect to time in our data.

4.7 Limitations

  • Closely considered within-person correlation patterns, but not intra-household correlation in the data
  • There was a high number of household contacts per index case, suggesting that intra-household correlation may play an important role in the structure of our data and should be addressed in future work.

5 Project 2: Survival Analysis for Primary SARS-CoV-2 Infections

  • Prompt. Characterize the association between index case vaccination status on time until last positive result over the 10 day study period.

  • Hypothesis. The time to last positive SARS-CoV-2 test (i.e., alleviation of COVID-19 infection) is inversely associated with positive vaccination status, number of vaccines received, and/or temporal proximity of last vaccine to SARS-CoV-2 infection

5.1 Group Decisions

  • Primary outcome and covariates of interest
  • Definition of recovery
  • Variables used to define survival outcome
  • Censoring: 0 if last available lab test is positive, 1 otherwise
  • We were asked to conduct a survival analysis with the “hazard” being recovery from COVID-19

5.2 Variables

5.2.1 Outcome

  • Number of days from disease onset to alleviation, defined as last day of positive lab result

5.3 Eligibility Criteria

  • Index cases with confirmed SARS-CoV-2 test(s) on at least 1 on-study day and non-missing vaccine information

5.4 Statistical Analysis

  • Time-to-event analysis: Stratified Cox proportional hazards model used to estimate time to last positive test for index cases, to account for potential differences in baseline hazard across sites.
    • Main exposures:

      • Vaccination status, number of prior vaccines, temporal proximity of last vaccine
    • Confounding variables:

      • Possible nonlinear effect of age

      • Participant clinico-demographic variables

5.5 Exploratory Analysis

# check which vax dates fall within 14 days of symptom onset
# ...for each vaccine number: negative numbers indicate vaccine received after covid infection
prac$diffvax1 <- difftime(as.POSIXlt(prac$symptom_or_test_date), as.POSIXlt(prac$bestdate.dose1), units = "days")
prac$diffvax2 <- difftime(as.POSIXlt(prac$symptom_or_test_date), as.POSIXlt(prac$bestdate.dose2), units = "days")
prac$diffvax3 <- difftime(as.POSIXlt(prac$symptom_or_test_date), as.POSIXlt(prac$bestdate.dose3), units = "days")
prac$diffvax4 <- difftime(as.POSIXlt(prac$symptom_or_test_date), as.POSIXlt(prac$bestdate.dose4), units = "days")
prac$diffvax5 <- difftime(as.POSIXlt(prac$symptom_or_test_date), as.POSIXlt(prac$bestdate.dose5), units = "days")

# count those who got each dose number (binary)
# subtract vaccine if within 14 days (or negative number = received after covid infection) of symptom onset
prac <- prac %>%
  mutate(dose.num1 = case_when((!is.na(date.dose1) | !is.na(covid_year_dose1)) & (diffvax1 > 14) ~ 1,
                               is.na(date.dose1) & is.na(covid_year_dose1) ~ 0)) 
prac <- prac %>% mutate(dose.num1 = ifelse(is.na(dose.num1), 0, dose.num1))

prac <- prac %>%
  mutate(dose.num2 = case_when((!is.na(date.dose2) | !is.na(covid_year_dose2)) & (diffvax2 > 14) ~ 1,
                               is.na(date.dose2) & is.na(covid_year_dose2) ~ 0)) 
prac <- prac %>% mutate(dose.num2 = ifelse(is.na(dose.num2), 0, dose.num2))

prac <- prac %>%
  mutate(dose.num3 = case_when((!is.na(date.dose3) | !is.na(covid_year_dose3)) & (diffvax3 > 14) ~ 1,
                               is.na(date.dose3) & is.na(covid_year_dose3) ~ 0)) 
prac <- prac %>% mutate(dose.num3 = ifelse(is.na(dose.num3), 0, dose.num3))

prac <- prac %>%
  mutate(dose.num4 = case_when((!is.na(date.dose4) | !is.na(covid_year_dose4)) & (diffvax4 > 14) ~ 1,
                               is.na(date.dose4) & is.na(covid_year_dose4) ~ 0)) 
prac <- prac %>% mutate(dose.num4 = ifelse(is.na(dose.num4), 0, dose.num4))

prac <- prac %>%
  mutate(dose.num5 = case_when((!is.na(date.dose5) | !is.na(covid_year_dose5)) & (diffvax5 > 14) ~ 1,
                               is.na(date.dose5) & is.na(covid_year_dose5) ~ 0)) 
prac <- prac %>% mutate(dose.num5 = ifelse(is.na(dose.num5), 0, dose.num5))

# sum up all "relevant" (>14 days) vaccines
prac <- prac %>%
    rowwise() %>% 
    mutate(sum.vax = sum(c_across(starts_with("dose.num")), na.rm = T)) %>% ungroup() # ungroup after using rowwise()

+ + + + + + + + + + + + + + + + + + + + + + 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 7 8 Time (Days) Proportion not recovered Vaccination Status Current Past Unvaccinated Time to Last Positive Result

Figure 1. KM by Vaccination Status

  • Patients with current vaccinations (given within 6 months of infection) and past vaccinations (given >6 months outside of infection) appear to have largely similar recovery trajectories.
  • Counterintuitively, unvaccinated individuals appear to reach recovery slightly sooner than both vaccinated groups.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 7 8 Time (Days) Proportion not recovered Number of Vaccinations Received 0 1 2 3 4 5 Time to Last Positive Result

Figure 2. KM by Vaccination Number

  • No discernible pattern exists when groups are divided into number of vaccines received.
  • Although patients with 5 vaccinations recorded appear to recovery more quickly, this is most likely due to the small number of individuals in this category.

9 8 7 6 5 4 39 37 34 32 3 29 27 25 24 23 22 21 20 19 17 16 13 11 40 38 36 35 33 31 30 28 26 2 18 15 14 12 10 1 0 1 2 3 4 5 6 7 8 9 time1 1 0

Figure 3. Testing Trajectories

  • 1 = “Recovered”; 0 = “Not Recovered”
  • A random selection of 40 patients shows testing trajectories over the course of 10 study days.
  • A substantial number of patients test positive throughout the study, with no recovery
  • A handful of patients show alternate testing results, with intermittent “negative” tests followed by subsequent “positive” tests
## Boxplots by vaccination status and age

p3_agenumvax <- dta %>%
  group_by(id) %>%
  slice_max(diary.day) %>%
  mutate(sum.vax = fct_rev(fct_infreq(sum.vax))) %>%
  ggplot() +
  theme_classic() +
  aes(x=age_at_enrollment, y=sum.vax) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(height = 0.2, alpha=0.3, color="steelblue") +
  labs(x = "Age",
       y = "",
       title = "Distribution of age by number of vaccines")

Past Current Unvaccinated 0 20 40 60 80 Age Distribution of age by vaccination status

Figure 5. Age by Vaccination Status

  • Unvaccinated individuals appear to have lower median ages than participants who reported vaccination in the past (>6 months) and current vaccination (within 6 months).

4+ 3 2 1 0 0 20 40 60 80 Age Distribution of age by number of vaccines

Figure 6. Age by Vaccine Number

  • Younger participants demonstrate a lower number of vaccinations than older participants in our cohort. This is likely, at least partially, due to differences in vaccination and booster guidelines for different age groups over the past three years.

Figure 7. Variable Correlation

  • A correlation plot shows a strong correlation between vaccination status and number of vaccines received. With this in mind, separate models will be constructed with each as the primary exposure variable to avoid problems with multicollinearity.
  • The plot confirms a moderate correlation between subject age, number of vaccinations received and vaccination status.

5.6 Results

## Cox PH model for vaccination group
fit1 <- coxph(Surv(time, result) ~ ns(age_at_enrollment, df=3) + sex + race.eth + sxstat + comorb.resp + comorb.other + factor(vax.group) + strata(site), data = dta)

Unvax Prior vax Comorb other Comorb resp Pauci sx Asymptom Race/eth unk Race/eth AI or AN Race/eth NH PI Race/eth Asian Race/eth Black or Af Am Race/eth Hispanic Female Age spline 3 Age spline 2 Age spline 1 0.01 0.1 1 10 100 Recovery HR Surv(time,result)

Figure 8. Estimated coefficients and CIs for vaccination status

In our analysis of time to last positive test and vaccination status, the following trends are notable:

  • Participants reporting Asian and Native Hawaiian/Pacific Islander race display a higher “hazard” of negative test conversion by the end of the study when compared to participants reporting white race.

  • Participants who are asymptomatic at baseline show higher rates of recovery when compared to those who have acute respiratory infections (ARI) at baseline.

  • Counterintuitively, we also observe a higher rate of negative test conversion for patients reporting non-respiratory comorbidities compared to those reporting none.

  • Finally, we observe a lower hazard of negative test conversion in participants reporting past vaccinations (outside 6 months) compared to those with current vaccinations.

## Cox PH model for number of vaccinations
fit2 <- coxph(Surv(time, result) ~ ns(age_at_enrollment, df=3) + sex + race.eth + sxstat + comorb.resp + comorb.other + factor(sum.vax) + strata(site), data = dta)

Vax 4+ Vax 3 Vax 2 Vax 1 Comorb other Comorb resp Pauci sx Asymptom Race/eth unk Race/eth AI or AN Race/eth NH PI Race/eth Asian Race/eth Black or Af Am Race/eth Hispanic Female Age spline 3 Age spline 2 Age spline 1 0.01 0.1 1 10 100 Recovery HR Surv(time,result)

Figure 9. Estimated coefficients and CIs for number of vaccines

In our analysis of time to last positive test and number of vaccinations, the following trends are notable:

  • Participants reporting Asian and Native Hawaiian/Pacific Islander race again display a higher “hazard” of negative test conversion by the end of the study when compared to participants reporting white race, as was the case in model 1.

  • Similarly, participants who are asymptomatic at baseline show higher rates of recovery when compared to those who have acute respiratory infections (ARI) at baseline.

  • We again observe a higher rate of negative test conversion for patients reporting non-respiratory comorbidities compared to those reporting none.

  • Suprisingly, we also observe a lower rate of infection alleviation in participants receiving the most vaccines when compared to unvaccinated individuals.

5.7 Limitations

  • Derivation of time to recovery may be improved by a more conservative definition of recovery. For example, several patients showed alternating test results over the course of the study, and some are considered “recovered” with a single negative test on the final day of follow up.

  • Diagnostics suggestive of violations to proportional hazards

  • Results are highly counter-intuitive and should be interpreted cautiously and validated with potentially more appropriate models and/or alternative relationships between outcomes and predictors.

  • Although this is outside the scope of the study, future work in this area may employ additional statistical tools in choosing covariates. For example, a causal inference approach may consider potential effects of collider bias on interpretation of model results.

6 References

  1. Grijalva C, Kipp HK, et al., RVTN – Sentinel Respiratory Virus Transmission Network Protocol (2021)
  2. Harrell F, Statistical Thinking - Hmisc Examples (2023)

7 Appendix 1

7.1 RVTN Eligibility Criteria

7.1.1 Exclusion Criteria

7.1.1.1 Index cases

An eligible index case will be excluded from the study if s/he:

  • Does not live in a household (e.g., lives in a correctional facility, skilled nursing facility, long-term care facility, boarding school, hostel, or in a dormitory) OR

  • Indicates that at least one other person in the household had an acute respiratory illness/COVID-like symptoms, or tested positive for SARS-CoV-2 or influenza in the 7 days before or on the date of the index case illness onset or testing date

7.1.1.2 Household contacts

  • None

7.1.1.3 Households

An eligible household will be excluded from the study if:

  • Seven (7) or more days have elapsed between illness onset in the index case and the enrollment encounter. The enrollment encounter must occur within 0-6 days after the index case’s illness onset or the date the index case was tested if asymptomatic OR

  • More than a third (33%) of household contacts do not intend to enroll (for example, in a 2–3 member household all household contacts must participant; in households with 4–6 members, only 1-2 contact may decline to participate to remain eligible; and in households with 7–9 members, only 2-3 contacts may decline to participate to remain eligible)

7.1.2 Inclusion Criteria

7.1.2.1 Index cases

An index case is eligible for inclusion if s/he:

  • Has laboratory or home/test-confirmed SARS-CoV-2 or influenza virus infection by either rapid diagnostic assay or RT-PCR AND

  • Has acute respiratory illness/COVID-like symptoms with onset no more than 5 days prior to presentation at the recruiting/testing clinic/site or reports being asymptomatic on the testing date AND

  • Lives, and has plans to live in his/her household for the follow-up period AND

  • Is not hospitalized and has not been hospitalized since the date of illness onset

7.1.2.2 Household contacts

A household contact is eligible for inclusion if s/he:

  • Routinely sleeps (slept in the household about half the nights in the last month) in the same household as the index case AND

  • Slept in the household for at least 1 night in the period from 1 day prior to illness onset or positive test date in the index case through the current date AND

  • Lives and has plans to live in the household for the follow-up period

7.1.2.3 Households

A household is eligible for inclusion if:

  • At least 1 eligible household member, other than the index case, enrolls in the study AND

  • There is at least 1 eligible household member who was not ill on the day of illness onset in the index case (or the date the index case was tested, in the instance of an asymptomatic index case)

8 Appendix 2

8.1 Diagnostics for Project 1

Asssuming Proportional Odds Observed 1 2 3 4 5 6 7 8 9 10 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Day Proportion y ARI paucisymptomatic asymptomatic

9 Appendix 3

9.1 Diagnostics for Project 2

## Model 1
fit1 <- coxph(Surv(time, result) ~ ns(age_at_enrollment, df=3) + sex + race.eth + sxstat + comorb.resp + comorb.other + factor(vax.group) + strata(site), data = dta)

ph.test1 <- cox.zph(fit1)
# small p-value = violation to PH assumption
p3_schoen1 <- ggcoxzph(ph.test1)
ph.test1

## Model 2
fit2 <- coxph(Surv(time, result) ~ ns(age_at_enrollment, df=3) + sex + race.eth + sxstat + comorb.resp + comorb.other + factor(sum.vax) + strata(site), data = dta)

ph.test2 <- cox.zph(fit2)

# small p-value = violation to PH assumption
p3_schoen2 <- ggcoxzph(ph.test2)
ph.test2