Chapter 16 Survival Analysis

Survival analysis models time to event. Whereas linear regression models a normal distribution of outcomes, time-to-event analysis only takes on positive values, so survival analysis uses the Weibull distribution. Another complication with surival analysis is censoring.

These notes rely on the Survival Analysis in R DataCamp course and Applied Survival Analysis Using R by Dirk Moore (Moore 2016).

Most surival analysis uses the survival and survminer packages.

library(tidyverse)

library(survival)

library(survminer)

The examples in these notes will use the following data sets.

16.0.1 GBSG2

GBSG2 contains time to death of 686 breast cancer patients.

data(GBSG2, package = "TH.data")

Column cens indicate whether or not a person in the study has died (0-censored, 1-event).

GBSG2 %>% count(cens)
## # A tibble: 2 x 2
##    cens     n
##   <int> <int>
## 1     0   387
## 2     1   299

You will typically structure your data in a Surv object. In the sampled data shown below, “+” indicates a censored observation.

sobj <- Surv(time = GBSG2$time, event = GBSG2$cens)

head(sobj, n = 10)
##  [1] 1814  2018   712  1807   772   448  2172+ 2161+  471  2014+
summary(sobj)
##       time          status    
##  Min.   :   8   Min.   :0.00  
##  1st Qu.: 568   1st Qu.:0.00  
##  Median :1084   Median :0.00  
##  Mean   :1124   Mean   :0.44  
##  3rd Qu.:1685   3rd Qu.:1.00  
##  Max.   :2659   Max.   :1.00

16.0.2 Unemp

UnempDur contains time to re-employment of 3,343 unemployed persons.

data(UnempDur, package = "Ecdat")

Column censor1 indicates the re-employment event in a full-time job. Column spell indicates the length of time unemployed in number of two-week intervals.

UnempDur %>% count(censor1)
## # A tibble: 2 x 2
##   censor1     n
##     <dbl> <int>
## 1       0  2270
## 2       1  1073
sobj <- Surv(time = UnempDur$spell, event = UnempDur$censor1)

head(sobj)
## [1]  5  13  21   3   9+ 11+
summary(sobj)
##       time          status    
##  Min.   : 1.0   Min.   :0.00  
##  1st Qu.: 2.0   1st Qu.:0.00  
##  Median : 5.0   Median :0.00  
##  Mean   : 6.2   Mean   :0.32  
##  3rd Qu.: 9.0   3rd Qu.:1.00  
##  Max.   :28.0   Max.   :1.00

References

Moore, Dirk F. 2016. Applied Survival Analysis Using R. 1st ed. New York, NY: Springer. https://eohsi.rutgers.edu/eohsi-directory/name/dirk-moore/.