8.4 Information Criteria
- account for the likelihood and the number of parameters to assess model comparison.
8.4.1 Akaike’s Information Criteria (AIC)
Derived as an estimator of the expected Kullback discrepancy between the true model and a fitted candidate model
\[ AIC = -2l(\hat{\theta}, \hat{\beta}) + 2q \]
where
- \(l(\hat{\theta}, \hat{\beta})\) is the log-likelihood
- q = the effective number of parameters; total of fixed and those associated with random effects (variance/covariance; those not estimated to be on a boundary constraint)
Note:
- In comparing models that differ in their random effects, this method is not advised to due the inability to get the correct number of effective parameters).
- We prefer smaller AIC values.
- If your program uses \(l-q\) then we prefer larger AIC values (but rarely).
- can be used for mixed model section, (e.g., selection of the covariance structure), but the sample size must be very large to have adequate comparison based on the criterion
- Can have a large negative bias (e.g., when sample size is small but the number of parameters is large) due to the penalty term can’t approximate the bias adjustment adequately
8.4.2 Corrected AIC (AICC)
- developed by (Hurvich and Tsai 1989)
- correct small-sample adjustment
- depends on the candidate model class
- Only if you have fixed covariance structure, then AICC is justified, but not general covariance structure
8.4.3 Bayesian Information Criteria (BIC)
\[ BIC = -2l(\hat{\theta}, \hat{\beta}) + q \log n \]
where n = number of observations.
- we prefer smaller BIC value
- BIC and AIC are used for both REML and MLE if we have the same mean structure. Otherwise, in general, we should prefer MLE
With our example presented at the beginning of Linear Mixed Models,
\[ Y_{ik}= \begin{cases} \beta_0 + b_{1i} + (\beta_1 + \ b_{2i})t_{ij} + \epsilon_{ij} & L \\ \beta_0 + b_{1i} + (\beta_2 + \ b_{2i})t_{ij} + \epsilon_{ij} & H\\ \beta_0 + b_{1i} + (\beta_3 + \ b_{2i})t_{ij} + \epsilon_{ij} & C \end{cases} \]
where
- \(i = 1,..,N\)
- \(j = 1,..,n_i\) (measures at time \(t_{ij}\))
Note:
- we have subject-specific intercepts,
\[ \begin{aligned} \mathbf{Y}_i |b_i &\sim N(\mathbf{X}_i \beta + 1 b_i, \sigma^2 \mathbf{I}) \\ b_i &\sim N(0,d_{11}) \end{aligned} \]
here, we want to estimate \(\beta, \sigma^2, d_{11}\) and predict \(b_i\)