7.14 Absolute Fit Measures in the Bayesian Approach of CDM estimation

7.14.1 Posterior Predictive Model Checking

The contents of this section are mostly summarized from Han & Johnson (2019) and Zhan et al. (2019b).

To check the fit measures in the Bayesian approach of CDM estimation, we first need to ensure that all model parameters converged adequately.

In the Bayesian paradigm, the Posterior Predictive Model Checking (PPMC) method is very popular. PPMC is widely accepted for its strong theoretical foundation. The primary idea behind PPMC is to compare the replicated data simulated/generated from the posterior predictive distribution, with the observed data. Theoretically PPMC can be explained as,

\[p\left(\boldsymbol{x}^{r e p} \mid \boldsymbol{x}\right)=\int p\left(\boldsymbol{x}^{r e p} \mid \boldsymbol{\theta}\right) p(\boldsymbol{\theta} \mid \boldsymbol{x}) d \boldsymbol{\theta},\] where

  • \(x\) and \(x^{rep}\) are observed and generated data respectively

  • \(\theta\) contains hyper-parameters according to the prior distribution

  • \(p(x^{rep}|\theta)\) is the joint likelihood function

  • \(p(\theta|x)\) is the posterior distribution given the observed data.

In practice, we can explore the adequacy of a model by summarizing the posterior predictive p-value(PPP) (Gelman et al., 1996).

\[p p p=\int_{\boldsymbol{\theta}} \int_{\boldsymbol{x}^{r e p}} I_{\left[D(\boldsymbol{x}, \boldsymbol{\theta}) \leq D\left(\boldsymbol{x}^{r e p}, \boldsymbol{\theta}\right)\right]} p\left(\boldsymbol{x}^{\text {rep }} \mid \boldsymbol{\theta}\right) p(\boldsymbol{\theta} \mid \boldsymbol{x}) d \boldsymbol{x}^{r e p} d \boldsymbol{\theta},\] where, \(D(x,\theta)\) represents the discrepancy measures, and \(I[.]\) is an indicator function.

For practical implementation of PPMC, we should calculate PPMC measures along with MCMC steps. For CDMs estimated using the Bayesian approach, we can use the following approach proposed by Yan et al. (2003), and demonstrated by Han & Johnson (2019).

In practice, we may use the sum of the squared Pearson residuals for person \(n\) and item \(j\) as a discrepancy measure to assess the absolute fit of the model for response data.

\[D\left(Y_{n j} ; \alpha_n\right)=\sum_{n=1}^N \sum_{j=1}^J\left(\frac{Y_{n j}-p_{n j}}{\sqrt{p_{n j}\left(1-p_{n j}\right)}}\right)^2\] Also, other discrepancy measures are available. Here, \(p_{nj} = P(Y_{nj=1}|\alpha_n)\). For instance, if we wish to calculate \(PPP\), we can use the following code during our MCMC estimation process.

A sample code for PPP calculation for DINA model.

Code
Y <- sim10GDINA$simdat
Q <- sim10GDINA$simQ
all.patterns <- GDINA::attributepattern(ncol(Q))


jags.dina <- function() {
  
  for (n in 1:N) {
    for (i in 1:I) {
      eta[n, i] <- 1 * (sum(alpha[n, 1:K] * Q[i, 1:K]) >= sum(Q[i,
                                                                1:K]))
      p[n, i] <- g[i] + (1 - s[i] - g[i]) * eta[n, i]
      Y[n, i] ~ dbern(p[n, i])
    }
    for (k in 1:K) {
      alpha[n, k] <- all.patterns[latent.group.index[n], k]
    }
    latent.group.index[n] ~ dcat(pi[1:C])
  }
  
  pi[1:C] ~ ddirch(delta[1:C])
  
  for (i in 1:I) {
    s[i] ~ dbeta(1, 1)
    g[i] ~ dbeta(1, 1) %_% T(0, 1 - s[i])
  }
  
  for (n in 1:N){
    for (i in 1:I){
      
      teststat[n,i] <- pow(Y[n, i] - p[n, i], 2)/(p[n, i] * (1 - p[n, i]))
      
      Y_rep[n, i]~dbern(p[n, i])
      
      teststat_rep[n,i] <- pow(Y_rep[n, i] - p[n, i],2)/(p[n, i] * (1 - p[n, i]))}}
  
      teststatsum <- sum(teststat[1:N, 1:I])
      teststatsum_rep <- sum(teststat_rep[1:N, 1:I])
      ppp <- step(teststatsum_rep - teststatsum)
  
}

library(R2jags)
N <- nrow(Y)
I <- nrow(Q)
K <- ncol(Q)
C <- nrow(all.patterns)
delta <- rep(1, C)
jags.data <- list("N", "I", "K", "Y", "Q", "C", "all.patterns", "delta")
jags.parameters <- c("s", "g", "latent.group.index", "pi", "ppp")
jags.inits <- NULL
jags.dina.mcmc <- jags(data = jags.data, inits = jags.inits, parameters.to.save = jags.parameters,
                       model.file = jags.dina, n.chains = 2, n.iter = 50, n.burnin = 25,
                       n.thin = 1, DIC = TRUE)

library(MCMCvis)
MCMCvis::MCMCsummary(jags.dina.mcmc, params = c("ppp"), Rhat = TRUE, round = 2)

References

Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 733–760.
Han, Z., & Johnson, M. S. (2019). Global-and item-level model fit indices. Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages, 265–285.
Yan, D., Mislevy, R. J., & Almond, R. G. (2003). Design and analysis in a cognitive assessment. ETS Research Report Series, 2003(2), i–47.
Zhan, P., Jiao, H., Man, K., & Wang, L. (2019b). Using JAGS for bayesian cognitive diagnosis modeling: A tutorial. Journal of Educational and Behavioral Statistics, 44(4), 473–503.