7.1 Relative Model-Data Fit at Test Level
In this section, our goal is to evaluate multiple models to see which model provides a better fit for the test-level data.
- The primary idea is to employ fit evaluation statistics to identify the true-model as the best fitting model(Chen et al., 2013), however, we dont really know whether the true model is included in the pool of competing models.
Different relative fit statistics are available for comparing model data fit among two or more models. In most cases, we use:
Information Criterion indices.
Likelihood- Ratio(LR) test statistics.
7.1.1 Information criterions as a function of Maximum Likelihood(ML) of CDM models(Chen et al., 2013)
Information criterions is calcluated as a function of ML of CDMs, which is based on the ML estimates of the Item parameters. In this case, we integrate out the individual attribute profiles.
Let us assume
- \(N\) is the sample size or number of examinees participating in a test,
- \(Y_i\) is the response vector for examinee \(i\),
- \(\hat{\beta}\) represent item parameters,
- \(\alpha_l\) is the l-th attribute vector,
- \(p(\alpha_l)\) is the prior-probability of \(\alpha_l\)
Let \(L(\mathbf{Y})\) be the likelihood of observing item response vectors of \(N\) students.
\[ L(Y)=\mathrm{ML}=\prod_{i=1}^N \sum_{l=1}^L L\left(\mathbf{Y}_i \mid \hat{\beta}, \boldsymbol{\alpha}_l\right) p\left(\boldsymbol{\alpha}_l\right) \] Theoretically, a model with larger \(L(\mathbf{Y})\) or smaller \(-2\log L(\mathbf{Y})\) is preferred because it makes the data more likely to occur.
However, based on this rule, a model with more parameters is usually ‘’preferred’’, yielding overfitting issue.
To avoid overfitting, we can add a penalty to penalize a model with too many parameters.