## 3.1 The semiparametric model

A parametric survival model is one in which survival time (the outcome) is assumed to follow a known distribution. Examples of distributions that are commonly used for survival time are: the Weibull, the exponential (a special case of the Weibull), the log-logistic, the log-normal, etc.

The Cox proportional hazards model, by contrast, is not a fully parametric model. Rather it is a semi-parametric model because even if the regression parameters (the betas) are known, the distribution of the outcome remains unknown. The baseline survival (or hazard) function is not specified in a Cox model (we do not assume any shape or form).

As before, let $$T$$ denote the time to some event. Our data, based on a sample of size $$n$$, consists of the triple $$(\widetilde{T}_i, \Delta_i, \textbf{X}_i$$, $$i = 1,...,n$$ where $$\widetilde{T}_i$$ is the time on study for the $$i$$-th patient, $$\Delta_i$$ is the event indicator for the $$i$$-th patient ($$\Delta_i=1$$ if the event has occurred and $$\Delta_i=0$$ if the lifetime is right-censored) and $$\textbf{X}_i= (X_{i1},\ldots, X_{ip})^t$$ is the vector of covariates or risk factors for the $$i$$-th individual which may affect the survival distribution of $$T$$.

Note that the covariates $$X_{ij}$$, with $$j = 1, \ldots, p$$, may be time-dependent as $$\textbf X_i(t)=(X_{i1},\ldots,X_{ip})^t$$ whose value changes over time. This situation must be analyzed using the Extended Cox PH model. However, for ease of presentation, we shall consider the fixed-covariate case.

The Cox PH regression model (Cox 1972) is usually written in terms of the hazard model formula as follows

$h(t, \textbf X) = h_0(t) e^{\sum_{j=1}^p \beta_j X_j}.$

This model gives an expression for the hazard at time $$t$$ for an individual with a given specification of a set of explanatory variables denoted by the bold $$\textbf X$$.

Based on this model we can say that the hazard at time $$t$$ is the product of two quantities:

• The first of these, $$h_0(t)$$, is called the baseline hazard function or the hazard for a reference individual with covariate values 0.

• The second quantity is a parametric component which is a linear function of a set of $$p$$ explanatory $$X$$ variables that is exponentiated (it will be the relative risk associated with covariate values $$X$$).

Note that an important feature of this model, which concerns the proportional hazards (PH) assumption, is that the baseline hazard is a function of $$t$$, but does not involve the covariates. By contrast, the exponential expresion involves the $$X$$’s but not the time. The covariates here have a multiplicative effect and are called time-independent.3

Note that the model is assuming proportional hazards (the hazard for any individual $$i$$ is a fixed proportion of the hazard for any other individual $$j$$), that is:

$\frac{h_i(t|\textbf X_i)}{h_j(t|\textbf X_j)} = exp(\boldsymbol \beta(\textbf X_i - \textbf X_j))$

or

$h_i(t|\textbf X_i) = \exp( \boldsymbol \beta(\textbf X_i - \textbf X_j)) h_j(t|\textbf X_j)$ so hazard functions for each individual should be strictly parallel and the hazard ratio is constant over time.

### References

Cox, D. R. 1972. “Regression Models and Life-Tables (with Discussion).” Journal of the Royal Statistical Society, Series B: Methodological 34: 187–220.

1. It is possible, nevertheless, to consider covariates which do involve time. Such covariates are called time-dependent variables. When we consider these time-dependent covariates, the model is called the extended Cox model and in this case it no longer satisfies the proportional hazards assumption.