## 3.4 Hypothesis testing

In order to test the significance of a variable or a interaction term in the model we can use two procedures:

the

**Wald test**(typically used with Maximun Likelihood estimates)the

**Likelihood Ratio test (LRT)**(it uses the log likelihood to compare two nested models)

The null hypothesis of the **Wald test** states that the coeficient \(\beta_j\) is equal to 0. The test statistics is

\[ Z = \frac{\hat \beta_j - 0}{Std. Err (\hat \beta_j)} \sim N(0,1) \]

```
summary(m1)$coef
## coef exp(coef) se(coef) z
## LoanOriginalAmount2 -0.1217675 0.8853542 0.06661063 -1.828049
## IsBorrowerHomeownerTrue -0.2481456 0.7802463 0.06231124 -3.982357
## IncomeVerifiableTrue 0.2926323 1.3399500 0.30286111 0.966226
## Pr(>|z|)
## LoanOriginalAmount2 6.754227e-02
## IsBorrowerHomeownerTrue 6.823526e-05
## IncomeVerifiableTrue 3.339311e-01
# by hand... for IncomeVerifiable
z <- summary(m1)$coef[3, 1]/summary(m1)$coef[3, 3]
pvalue <- 2 * pnorm(z, lower.tail = FALSE)
pvalue
## [1] 0.3339311
```

According to the pvalue of the test, the null hypothesis is accepted (for the `IncomeVerifiable`

variable). Thus, the model must not include this variable.

The other approach is to use the **Likelihood Ratio test**. In this case, we need to compute the difference between the log likelihood statistic of the *reduced model* which does not contain the variable that we want to test and the log likelihood statistic of the *full model* containing the variable. In general, the LRT statistic can be written in the form of

\[ LRT = -2 ln \frac{L_R}{L_F}= 2 ln(L_F) - 2 ln(L_R) \sim \chi^2_p \] where \(L_R\) denotes the log likelihood of the reduced model with \(k\) parameter and \(L_F\) is the log likelihood of the full model with \(k + p\) parameters. \(\chi^2_p\) is a Chi-square with \(p\) degrees of freedom, where \(p\) denotes the number of predictors being assessed.

**LRT**.

```
m_red <- coxph(Surv(time, status) ~ LoanOriginalAmount2 + IsBorrowerHomeowner,
data = loan_filtered)
anova(m_red, m1) #fist the reduced, second the full
## Analysis of Deviance Table
## Cox model: response is Surv(time, status)
## Model 1: ~ LoanOriginalAmount2 + IsBorrowerHomeowner
## Model 2: ~ LoanOriginalAmount2 + IsBorrowerHomeowner + IncomeVerifiable
## loglik Chisq Df P(>|Chi|)
## 1 -10837
## 2 -10836 1.0297 1 0.3102
# by hand... for IncomeVerifiable variable
m1$loglik # the first is the log likelihood of a model that contains
## [1] -10848.75 -10836.52
# none of the predictors, so we need the second one
chi <- 2 * m1$loglik[2] - 2 * m_red$loglik[2]
pvalue <- 1 - pchisq(chi, df = 1) # df = 3 - 2
pvalue
## [1] 0.310227
```

In this case, using an \(\alpha = 0.05\) and testing the significance of the `IncomeVerifiable`

variable, we must remove it from the model.