## 3.1 The semiparametric model

A parametric survival model is one in which survival time (the outcome) is assumed to follow a known distribution. Examples of distributions that are commonly used for survival time are: the **Weibull**, the **exponential** (a special case of the Weibull), the **log-logistic**, the **log-normal**, etc.

The Cox proportional hazards model, by contrast, is not a fully parametric model. Rather it is a **semi-parametric model** because even if the regression parameters (the betas) are known, the distribution of the outcome remains unknown. The baseline survival (or hazard) function is not specified in a Cox model (we do not assume any shape or form).

As before, let \(T\) denote the time to some event. Our data, based on a sample of size \(n\), consists of the triple \((\widetilde{T}_i, \Delta_i, \textbf{X}_i\), \(i = 1,...,n\) where \(\widetilde{T}_i\) is the time on study for the \(i\)-th patient, \(\Delta_i\) is the event indicator for the \(i\)-th patient (\(\Delta_i=1\) if the event has occurred and \(\Delta_i=0\) if the lifetime is right-censored) and \(\textbf{X}_i= (X_{i1},\ldots, X_{ip})^t\) is the vector of covariates or risk factors for the \(i\)-th individual which may affect the survival distribution of \(T\).

**Extended Cox PH model**. However, for ease of presentation, we shall consider the fixed-covariate case.

The Cox PH regression model (Cox 1972) is usually written in terms of the hazard model formula as follows

\[ h(t, \textbf X) = h_0(t) e^{\sum_{j=1}^p \beta_j X_j}. \]

This model gives an expression for the hazard at time \(t\) for an individual with a given specification of a set of explanatory variables denoted by the bold \(\textbf X\).

Based on this model we can say that the hazard at time \(t\) is the product of two quantities:

The first of these, \(h_0(t)\), is called the

**baseline hazard function**or the hazard for a reference individual with covariate values 0.The second quantity is a

**parametric component**which is a linear function of a set of \(p\) explanatory \(X\) variables that is exponentiated (it will be the*relative risk*associated with covariate values \(X\)).

Note that an important feature of this model, which concerns the **proportional hazards (PH) assumption**, is that the baseline hazard is a function of \(t\), but does not involve the covariates. By contrast, the exponential expresion involves the \(X\)’s but not the time. The covariates here have a multiplicative effect and are called **time-independent**.^{3}

Note that **the model is assuming proportional hazards** (the hazard for any individual \(i\) is a fixed proportion of the hazard for any other individual \(j\)), that is:

\[ \frac{h_i(t|\textbf X_i)}{h_j(t|\textbf X_j)} = exp(\boldsymbol \beta(\textbf X_i - \textbf X_j)) \]

or

\[ h_i(t|\textbf X_i) = \exp( \boldsymbol \beta(\textbf X_i - \textbf X_j)) h_j(t|\textbf X_j) \] so hazard functions for each individual should be strictly parallel and the hazard ratio is constant over time.

### References

Cox, D. R. 1972. “Regression Models and Life-Tables (with Discussion).” *Journal of the Royal Statistical Society, Series B: Methodological* 34: 187–220.

It is possible, nevertheless, to consider covariates which do involve time. Such covariates are called

**time-dependent**variables. When we consider these time-dependent covariates, the model is called the**extended Cox model**and in this case it no longer satisfies the proportional hazards assumption.↩