8.1 Item-level Absolute Fit measures

The ideology of item-level absolute fit measures are not much different compared to test-level fit measures. Sorrel et al. (2017) suggested that absolute item level fit can be assessed comparing the item performance in different groups, with the performance level predicted by different CDM models.

These measures focus on whether each item, on its own, fits the model’s predictions, without comparing it to other models. Therefore, we are considering it as absolute fit measure.

  • There are many absolute fit measures are proposed in the current literature. For now, let us focus on some specific fit measures at item level.

  • However, the basic idea of these fit measures are more or less similar.

Chen et al. (2013) proposed three measures for assessing item-level absolute fit.

To perform these analysis:

  • Fit a model to a set of data

  • Based on the model parameters, simulate item responses of a large number of students

  • Denote the responses of students to item \(j\) in the original sample by \(\mathbf{Y}_j\) and in the simulated sample by \(\mathbf{\tilde{Y}}_j\)

  • Calculate different item fit statistics:

proportion correct \(p_j\):

  • In the context of Cognitive Diagnosis Models (CDMs), proportion correct is a common item-level fit measure used to assess how well an individual test item performs relative to the expected responses based on the model’s predictions. So we are basically comparing the proportion of correct for specific items in the observed item response, compared to model predicted item response.

  • It compares the observed responses to the expected responses based on the model’s assumptions about examinees’ latent attribute mastery profiles.

This measure compares the observed proportion correct and the model-predicted for item \(j\).

\[p_{j}=\Bigg\lvert P(\mathbf{Y}_j=1)-{P}(\mathbf{\tilde{Y}}_j=1)\Bigg\rvert \] - In this case, first we calculate the observed proportion of correct \(P(Y_j=1)\), and then we calculate proportion of correct from model predicted responses,\(P(\tilde{Y}_j=1)\).

  • If the difference is high, we may say that the model is not predicting the item response correctly.

  • By comparing the observed proportion correct to the expected proportion correct, we can assess how well the model fits the data for each item. A well-fitting item will have observed proportions close to expected proportion/ model implied proportion, indicating that the model’s predictions align with the actual responses.

  • If there’s a large deviation between the observed proportion correct and expected proportion correct, it might indicate that the model is not accurately capturing the behavior of examinees on that particular item. This could be due to model misspecification, item functioning issues, or latent attribute misclassification.

transformed correlation \(r_{jj'}\):

\[r_{jj'}=\Bigg\lvert Z[Corr(\mathbf{Y}_j,\mathbf{Y}_{j'})]-Z[Corr(\mathbf{\tilde{Y}}_j,\mathbf{\tilde{Y}}_{j'})]\Bigg\rvert \]

  • To calculate \(r_{jj'}\), first we need to calculate Pearson’s correlation between item responses between item pairs \(Y_j\) and \(Y_{j'}\).

  • Second, we need to perform Fisher’s transformation of the correlation for both observed item responses, and model implied item responses.

Fishers transformation: \(Z(r) = \frac{1}{2}\times ln(\frac{1+r}{1-r})\)

  • To calculate the transformed correlation we consider the absolute differences between the two transformed correlations.

  • a small \(r_{jj'}\) value suggests good fit of the model.

log odds ratio \(l_{jj'}\):

  • log odds ratio is calculated for item-level fit, based on observed and expected response patterns.

  • It calculates the difference between the observed log odds and the expected log odds.

\[ l_{jj'}=\Bigg\lvert \log \frac{N_{11}N_{00}}{N_{01}N_{10}}-\log\frac{\tilde{N}_{11}\tilde{N}_{00}}{\tilde{N}_{01}\tilde{N}_{10}}\Bigg\rvert\] - The absolute difference between the two log odds ratios provides a measure of how much the observed response pattern deviates from what the model predicts. A higher value of \(l_{jj'}\)indicates a greater discrepancy, suggesting a potential misfit for the items in question.

8.1.1 Performing hypothesis tests:

we can estimate their standard errors and z-scores can be obtained by dividing the statistics by their corresponding standard errors.

\[z\big[p_j\big]=\frac{p_j}{SE[p_j]}\sim N(0,1)\]

\[z\big[r_{jj'}\big]=\frac{r_{jj'}}{SE[r_{jj'}]}\sim N(0,1)\]

\[z\big[l_{jj'}\big]=\frac{l_{jj'}}{SE[l_{jj'}]}\sim N(0,1)\]

Exercise

If a test has 10 items, how many \(z\big[p_j\big]\), \(z\big[r_{jj'}\big]\) and \(z\big[l_{jj'}\big]\) do we have?

References

Chen, J., Torre, J. de la, & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140.
Sorrel, M. A., Abad, F. J., Olea, J., Torre, J. de la, & Barrada, J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41(8), 614–631.