3.2 Estimation

The estimation of the model is obtained by Maximun Likelihood, particularly maximazing the “partial” likelihood function rather than a (complete) likelihood function. The term “partial” likelihood is used because the likelihood formula considers probabilities only for those subjects who fail, and does not explicitly consider probabilities for those subjects who are censored. The “partial” likekihood is given by:

\[ L(\boldsymbol \beta) = \prod_{i:\Delta_i = 1} \frac{\exp\bigg[ \sum_{j=1}^{p}\beta_j X_{(i)j} \bigg]}{\sum_{k \in R(t_i)} \exp \bigg[ \sum_{j=1}^{p}\beta_j X_{(k)j} \bigg]} \] being \(t_1 < t_2 < \ldots < t_D\) the ordered event times, \(X_{(i)j}\) the \(j\)-th covariate associated with the individual whose failure time is \(t_i\) and \(R(t_i)\) the risk set at time \(t_i\), that is, the the set of all individuals who are still under study at a time just prior to \(t_i\).

Note that the numerator of the likelihood depends only on information from the individual who experiences the event, whereas the denominator uses information about all individuals who have not yet experienced the event (including some individuals who will be censored later).

The (partial) maximum likelihood estimates are found by maximizing the \(ln (L(\boldsymbol \beta))\) particularly, by taking partial derivatives of \(ln (L(\boldsymbol \beta))\) with respect to each parameter in the model, and then solving a system of equations. For this algorithm such as Newton–Raphson (Ypma 1995) or Expectation-Maximitazion (Dempster, Laird, and Rubin 1977) are used.4

In R, we can estimate this model using the coxph function of the survival package.

loan_filtered$LoanOriginalAmount2 <-  loan_filtered$LoanOriginalAmount/10000

model <- coxph(Surv(time, status) ~ LoanOriginalAmount2 + IsBorrowerHomeowner +
                 IncomeVerifiable, data = loan_filtered) 

For taking the ties into account we can use the method argument

coxph(Surv(time, status) ~ LoanOriginalAmount2 + IsBorrowerHomeowner +
        IncomeVerifiable, data = loan_filtered, method = "efron") 

coxph(Surv(time, status) ~ LoanOriginalAmount2 + IsBorrowerHomeowner +
        IncomeVerifiable, data = loan_filtered, method = "breslow") 

coxph(Surv(time, status) ~ LoanOriginalAmount2 + IsBorrowerHomeowner +
        IncomeVerifiable, data = loan_filtered, method = "exact") 

References

Ypma, Tjalling J. 1995. “Historical Development of the Newton–Raphson Method.” SIAM Review 37 (4): 531–51. doi:10.1137/1037125.

Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the Em Algorithm.” Journal of the Royal Statistical Society. Series B (Methodological) 39 (1). [Royal Statistical Society, Wiley]: 1–38. http://www.jstor.org/stable/2984875.


  1. In the presence of ties, the Breslow (1975) or Efron (1977) approximations to the log-likelihood can be used.