Chapter 16 Survival Analysis
Survival analysis models time to event. Whereas linear regression models a normal distribution of outcomes, time-to-event analysis only takes on positive values, so survival analysis uses the Weibull distribution. Another complication with surival analysis is censoring.
These notes rely on the Survival Analysis in R DataCamp course and Applied Survival Analysis Using R by Dirk Moore (Moore 2016).
Most surival analysis uses the survival
and survminer
packages.
The examples in these notes will use the following data sets.
16.0.1 GBSG2
GBSG2
contains time to death of 686 breast cancer patients.
Column cens
indicate whether or not a person in the study has died (0-censored, 1-event).
## # A tibble: 2 x 2
## cens n
## <int> <int>
## 1 0 387
## 2 1 299
You will typically structure your data in a Surv
object. In the sampled data shown below, “+” indicates a censored observation.
## [1] 1814 2018 712 1807 772 448 2172+ 2161+ 471 2014+
## time status
## Min. : 8 Min. :0.00
## 1st Qu.: 568 1st Qu.:0.00
## Median :1084 Median :0.00
## Mean :1124 Mean :0.44
## 3rd Qu.:1685 3rd Qu.:1.00
## Max. :2659 Max. :1.00
16.0.2 Unemp
UnempDur contains time to re-employment of 3,343 unemployed persons.
Column censor1
indicates the re-employment event in a full-time job. Column spell
indicates the length of time unemployed in number of two-week intervals.
## # A tibble: 2 x 2
## censor1 n
## <dbl> <int>
## 1 0 2270
## 2 1 1073
## [1] 5 13 21 3 9+ 11+
## time status
## Min. : 1.0 Min. :0.00
## 1st Qu.: 2.0 1st Qu.:0.00
## Median : 5.0 Median :0.00
## Mean : 6.2 Mean :0.32
## 3rd Qu.: 9.0 3rd Qu.:1.00
## Max. :28.0 Max. :1.00
References
Moore, Dirk F. 2016. Applied Survival Analysis Using R. 1st ed. New York, NY: Springer. https://eohsi.rutgers.edu/eohsi-directory/name/dirk-moore/.