A short course on Survival Analysis applied to the Financial Industry

3.1 The semiparametric model

A parametric survival model is one in which survival time (the outcome) is assumed to follow a known distribution. Examples of distributions that are commonly used for survival time are: the Weibull, the exponential (a special case of the Weibull), the log-logistic, the log-normal, etc.

The Cox proportional hazards model, by contrast, is not a fully parametric model. Rather it is a semi-parametric model because even if the regression parameters (the betas) are known, the distribution of the outcome remains unknown. The baseline survival (or hazard) function is not specified in a Cox model (we do not assume any shape or form).

As before, let \(T\) denote the time to some event. Our data, based on a sample of size \(n\), consists of the triple \((\widetilde{T}_i, \Delta_i, \textbf{X}_i\), \(i = 1,...,n\) where \(\widetilde{T}_i\) is the time on study for the \(i\)-th patient, \(\Delta_i\) is the event indicator for the \(i\)-th patient (\(\Delta_i=1\) if the event has occurred and \(\Delta_i=0\) if the lifetime is right-censored) and \(\textbf{X}_i= (X_{i1},\ldots, X_{ip})^t\) is the vector of covariates or risk factors for the \(i\)-th individual which may affect the survival distribution of \(T\).

Note that the covariates \(X_{ij}\), with \(j = 1, \ldots, p\), may be time-dependent as \(\textbf X_i(t)=(X_{i1},\ldots,X_{ip})^t\) whose value changes over time. This situation must be analyzed using the Extended Cox PH model. However, for ease of presentation, we shall consider the fixed-covariate case.

The Cox PH regression model (Cox 1972) is usually written in terms of the hazard model formula as follows

\[ h(t, \textbf X) = h_0(t) e^{\sum_{j=1}^p \beta_j X_j}. \]

This model gives an expression for the hazard at time \(t\) for an individual with a given specification of a set of explanatory variables denoted by the bold \(\textbf X\).

Based on this model we can say that the hazard at time \(t\) is the product of two quantities:

The first of these, \(h_0(t)\), is called the baseline hazard function or the hazard for a reference individual with covariate values 0.
The second quantity is a parametric component which is a linear function of a set of \(p\) explanatory \(X\) variables that is exponentiated (it will be the relative risk associated with covariate values \(X\)).

Note that an important feature of this model, which concerns the proportional hazards (PH) assumption, is that the baseline hazard is a function of \(t\), but does not involve the covariates. By contrast, the exponential expresion involves the \(X\)’s but not the time. The covariates here have a multiplicative effect and are called time-independent.³

Note that the model is assuming proportional hazards (the hazard for any individual \(i\) is a fixed proportion of the hazard for any other individual \(j\)), that is:

\[ \frac{h_i(t|\textbf X_i)}{h_j(t|\textbf X_j)} = exp(\boldsymbol \beta(\textbf X_i - \textbf X_j)) \]

\[ h_i(t|\textbf X_i) = \exp( \boldsymbol \beta(\textbf X_i - \textbf X_j)) h_j(t|\textbf X_j) \] so hazard functions for each individual should be strictly parallel and the hazard ratio is constant over time.

References

Cox, D. R. 1972. “Regression Models and Life-Tables (with Discussion).” Journal of the Royal Statistical Society, Series B: Methodological 34: 187–220.

It is possible, nevertheless, to consider covariates which do involve time. Such covariates are called time-dependent variables. When we consider these time-dependent covariates, the model is called the extended Cox model and in this case it no longer satisfies the proportional hazards assumption.↩