## 4.2 Estimation of the Joint Model

In this section we are going to present the joint modelling framework motivated by the **time-to-event point of view**, that is, we want to add a time-dependent covariate measured with error in a survival model.

Let \(T_i\) denote the observed failure time for the \(i\)-th subject \((i = 1,...,n)\), which is taken as the minimum of the true event time \(T_i\) and the censoring time \(C_i\), i.e., \(\widetilde T_i = \min(T_i,C_i)\). Furthermore, we define the event indicator as \(\Delta_i = I(T_i \le C_i)\), where \(I\) is the indicator function that takes the value 1 if the condition \(T_i \le C_i\) is satisfied, and 0 otherwise. So, the observed data for the time-to-event outcome consist of the pairs \(\{(\widetilde T_i, \Delta_i), i = 1, . . . , n\}\).

For the longitudinal responses, let \(y_i(t)\) denote the value of the longitudinal outcome at time point \(t\) for the \(i\)-th subject. Note that we do not actually observe \(y_i(t)\) at all time points, but only at the very specific occasions \(t_{ij}\) at which measurements were taken. Thus, the observed longitudinal data consist of the measurements \(y_{ij} = \{yi(t_{ij}),j = 1,...,n_i\}\).

The objective is to associate the **true** and **unobserved** value of the longitudinal outcome at time \(t\), denoted by \(m_i(t)\), with the event outcome \(\widetilde T_i\). Note that \(m_i(t)\) is different from \(y_i(t)\) because this last is contaminated with measurement error value of the longitudinal outcome at time \(t\).

In order to quantify the effect of \(m_i(t)\) on the risk of an event, we can use a relative risk model of the form:

\[\begin{equation} h_i(t|\mathcal{M}_i(t),w_i) = h_0(t) \exp \{ \gamma^t w_i + \alpha m_i(t) \} \tag{4.1} \end{equation}\]where \(\mathcal{M}_i(t)=\{ m_i(s), 0 \le s < t\}\) denotes the denotes the history of the true unobserved longitudinal process up to time point \(t\), **\(h_0(\cdot)\) denotes the baseline risk function**, and \(w_i\) is a vector of baseline covariates (such as a treatment indicator, history of diseases, etc.) with a corresponding vector of regression coefficients \(\gamma\). Similarly, parameter \(\alpha\) **quantifies the effect of the underlying longitudinal outcome** to the risk for an event.

**interpretation of \(\gamma\) and \(\alpha\) is exactly the same**as we have seen in Chapter 3. In particular, \(exp(\gamma_j)\) denotes the ratio of hazards for one unit change in \(w_{ij}\) at any time \(t\), whereas \(exp(\alpha)\) denotes the relative increase in the risk for an event at time \(t\) that results from one unit increase in \(m_i(t)\) at the same time point.

In order to complete the specification of the above model, we need to think about **the choice for the baseline risk function** \(h_0(\cdot)\). In standard survival analysis it is customary to leave \(h_0(\cdot)\) completely unspecified in order to avoid the impact of misspecifying the distribution of survival times. However, within the joint modeling framework, it turns out that following such a route may lead to an **underestimation** of the standard errors of the parameter estimates (Hsieh, Tseng, and Wang 2006). To avoid such problems we will need to explicitly define \(h_0(\cdot)\), for example with a **known parametric** distribution or alternatively, and even more preferably, we can opt for a **parametric but flexible specification** of the baseline risk function. Several approaches are implemented in the `JM`

package under the argument `method`

.

#### The longitudinal submodel

In the model above we use \(m_i(t)\) to denote the **true value of the underlying longitudinal covariate** at time point \(t\). However, and as mentioned earlier, longitudinal information is actually collected intermittently and with error at a set of a few time points \(t_{ij}\) for each subject. So, to messure the effect of the longitudinal variable on the risk dor an event, we need to estimate \(m_i(t)\). To do this, we are going to use the linear mixed models of the form

\[ y_i(t) = m_i(t) + \varepsilon_i(t), \]

\[ m_i(t) = x_i^T(t)\beta + z_i^T(t)b_i + \varepsilon_i(t), \]

\[ b_i \sim N(0, D), \quad \varepsilon_i(t) \sim N(0, \sigma^2), \] where \(\beta\) denotes the vector of the unknown fixed effects parameters, \(b_i\) denotes a vector of random effects, \(x_i(t)\) and \(z_i(t)\) denote row vectors of the design matrices for the fixed and random effects, respectively, and \(\varepsilon_i(t)\) is the measument error term, which is assumed independent of \(b_i\).

The main estimation methods for joint models are based on **(semiparametric) maximum likelihood** and **Bayes using MCMM techniques**. The `JM`

package that we are going to use is based on maximum likelihood. The idea is the maximization of the log-likelihood corresponding to the joint distribution of the time-to-event and longitudinal out-comes \(\{\widetilde T_i,\Delta_i,y_i\}\). Standard numerical integration techniques such as Gaussian quadrature and Monte Carlo have been successfully applied in the joint modelling framework. See Section 4.3 of Rizopoulos (2012) for details.

### References

Hsieh, Fushing, Yi-Kuan Tseng, and Jane-Ling Wang. 2006. “Joint Modeling of Survival and Longitudinal Data: Likelihood Approach Revisited.” *Biometrics* 62 (4). Blackwell Publishing Inc: 1037–43. doi:10.1111/j.1541-0420.2006.00570.x.

Rizopoulos, Dimitris. 2012. *Joint Models for Longitudinal and Time-to-Event Data: With Applications in R*. 1st ed. Chapman & Hall/Crc Biostatistics Series. Chapman; Hall/CRC. http://gen.lib.rus.ec/book/index.php?md5=99C084286F029FD36E10FE18AA587A7C.