Chapter 3 Model-data fit analyses

3.1 R-Lab: Model-data fit analysis

For the previous lab, we used the R package “TAM” (Robitzsch et al., 2018) to run the Dichotomous Rasch analyses. In this lab, we will focus on another R Rasch package, which is the “eRm” package (Mair & Hatzinger, 2007). Please note that when we used “TAM” package with the MML estimation method, the item-person map looks a bit different compared with that produced by the “eRm” package which applies CML estimation.

3.1.1 Get Data Prepared

We are going to practice evaluating model-data fit with the transitive reasoning data from the previous lab.

# Load the R-packages that you're going to use
library("eRm") # For running the Dichotomous Rasch Model
library("readr") # To import the data
# Import the data
transreas <- read.csv("transreas.csv") 
summary(transreas)
##     Student        Grade          task_01          task_02      
##  Min.   :  1   Min.   :2.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:107   1st Qu.:3.000   1st Qu.:1.0000   1st Qu.:1.0000  
##  Median :213   Median :4.000   Median :1.0000   Median :1.0000  
##  Mean   :213   Mean   :4.005   Mean   :0.9412   Mean   :0.8094  
##  3rd Qu.:319   3rd Qu.:5.000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :425   Max.   :6.000   Max.   :1.0000   Max.   :1.0000  
##     task_03          task_04          task_05          task_06      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:1.0000  
##  Median :1.0000   Median :1.0000   Median :1.0000   Median :1.0000  
##  Mean   :0.8847   Mean   :0.7835   Mean   :0.8024   Mean   :0.9741  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##     task_07          task_08          task_09          task_10    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00  
##  1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.00  
##  Median :1.0000   Median :1.0000   Median :0.0000   Median :1.00  
##  Mean   :0.8447   Mean   :0.9671   Mean   :0.3012   Mean   :0.52  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.00  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00

3.1.2 Trim the data

Similar to the “TAM” package, we only need the responses to run the Dichotomous Rasch model with the “eRm” package. To get started, we need to remove the first two columns from the dataframe.

# Trim the data
Di_Rasch_data <- transreas[ ,c(-1,-2)]
head(Di_Rasch_data,10) # Take a look
##    task_01 task_02 task_03 task_04 task_05 task_06 task_07 task_08 task_09
## 1        1       1       1       1       1       1       1       1       0
## 2        0       1       1       1       1       1       0       1       0
## 3        0       0       1       0       0       0       0       1       1
## 4        1       1       1       1       1       1       1       1       1
## 5        1       1       1       1       1       1       0       1       0
## 6        1       1       1       1       1       1       1       1       0
## 7        1       1       1       1       1       1       1       1       1
## 8        1       1       0       1       1       1       1       1       0
## 9        1       1       1       1       1       1       1       1       0
## 10       1       1       1       0       1       1       1       1       0
##    task_10
## 1        0
## 2        0
## 3        1
## 4        1
## 5        0
## 6        1
## 7        1
## 8        0
## 9        1
## 10       0

3.1.3 Running Dichotomous Rasch Model with “eRm” package

We will use the “RM” function to run the Rasch dichotomous model. This function computes the parameter estimates of a Rasch model for binary item responses by using CML estimation.

# Running the Dichotomous Rasch Model
Di_Rasch_model <- RM(Di_Rasch_data)
# Check the Overall model summary
summary(Di_Rasch_model)
## 
## Results of RM estimation: 
## 
## Call:  RM(X = Di_Rasch_data) 
## 
## Conditional log-likelihood: -921.3465 
## Number of iterations: 18 
## Number of parameters: 9 
## 
## Item (Category) Difficulty Parameters (eta): with 0.95 CI:
##         Estimate Std. Error lower CI upper CI
## task_02    0.258      0.133   -0.003    0.518
## task_03   -0.416      0.157   -0.723   -0.109
## task_04    0.441      0.128    0.190    0.692
## task_05    0.309      0.131    0.052    0.567
## task_06   -2.175      0.292   -2.747   -1.604
## task_07   -0.025      0.141   -0.302    0.252
## task_08   -1.909      0.262   -2.423   -1.395
## task_09    2.923      0.130    2.668    3.179
## task_10    1.836      0.115    1.610    2.062
## 
## Item Easiness Parameters (beta) with 0.95 CI:
##              Estimate Std. Error lower CI upper CI
## beta task_01    1.243      0.204    0.842    1.643
## beta task_02   -0.258      0.133   -0.518    0.003
## beta task_03    0.416      0.157    0.109    0.723
## beta task_04   -0.441      0.128   -0.692   -0.190
## beta task_05   -0.309      0.131   -0.567   -0.052
## beta task_06    2.175      0.292    1.604    2.747
## beta task_07    0.025      0.141   -0.252    0.302
## beta task_08    1.909      0.262    1.395    2.423
## beta task_09   -2.923      0.130   -3.179   -2.668
## beta task_10   -1.836      0.115   -2.062   -1.610
# To achieve the Item difficulty
item.diff <- Di_Rasch_model$betapar * -1
item.diff
## beta task_01 beta task_02 beta task_03 beta task_04 beta task_05 beta task_06 
##  -1.24256308   0.25775001  -0.41581383   0.44093780   0.30941151  -2.17531697 
## beta task_07 beta task_08 beta task_09 beta task_10 
##  -0.02481129  -1.90884852   2.92343871   1.83581566
# Use the summary() function to get a quick numeric summary of the item parameters
summary(item.diff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.1753 -1.0359  0.1165  0.0000  0.4081  2.9234
# We can see that the average item difficulty value is 0.00 logits, and item difficulties range from -2.18 to 1.04 logits.
# We can also look at the standard errors for the item locations:
item.se <- Di_Rasch_model$se.beta
item.se
##  [1] 0.2043488 0.1327967 0.1566696 0.1282289 0.1314316 0.2916380 0.1413877
##  [8] 0.2622114 0.1302767 0.1152173
summary(item.se)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1152  0.1306  0.1371  0.1694  0.1924  0.2916

Note the Estimate in this output indicates the easiness of the Item. This is exactly the item difficulty, but in the opposite direction. The higher the value of this parameter, the easier the item is compared to the other items. You can multiply this value by -1 to get item difficulty.

3.2 Model-data fit Analysis in R

The Model-data fit in the context of Rasch is different from other IRT models. Other IRT approach is focus on finding the best model to fit the data. However, the Rasch approach focuses on diagnosing departures from model expectations. Within the Rasch framework, the model is viewed as an “ideal type”. It is a theoretical mathematical description of what measurement looks like. Its fit statistics summarize discrepancies between observations and expectations to help researchers improve their measurement procedures.

3.2.1 Reliability Indices in Rasch Measurement

Definition of reliability in the 2014 Test Standards

The general notion of reliabilty/precision is defined in terms of consistency over replications of the testing procedure. Reliability/precision is high if the scores for each person are consistent over replications of the testing procedure and is low if the scores are not consistent over replications. (p. 35, emphasis added)

From a Rasch perspective, the focus for reliability analyses is on ordering and separation on the logit scale. There are two major indices calculated for items and persons: one is the reliability of separation, and the other is the Chi-Square separation statistic.

The Rasch reliability of separation is calculated for each facet in the model (e.g., items & persons), and it is an estimate of how well we can differentiate individual items, persons, or other elements on the latent variable. It is conceptually related to Cronbach’s alpha coefficient. The interpretation is the same when data fit the model. The statistic ranges from 0 to 1.

3.2.1.1 Reliability of Person Separation

Calculated using a ratio of true (adjusted) variance to observed variance for persons: \[Rel_{p}=\left(SA_{P}^{2}\right) /\left(SD_{P}^{2}\right)\] Where: SA2P : Adjusted person variability; Calculated by subtracting error variance for persons from total person variance: \[ SA_{P}^{2} = SD_{P}^{2} - SE_{P}^{2}\] SD2P : Total person variance

3.2.1.2 Calculate the Reliability of Person Separation

The “eRm” package provides function “SepRel” to calculate the person separation reliability. This function calculates the proportion of person variance that is not due to error. The concept of person separation reliability is very similar to reliability indices such as Cronbach’s α.

# Get the person parameter first by using the "person.parameter" function
person_pa <- person.parameter(Di_Rasch_model)
# Calculate the Reliability of Person Separation
summary(SepRel(person_pa))
##        Separation Reliability: 0.4474
## 
##             Observed Variance: 1.6929 (Squared Standard Deviation)
## Mean Square Measurement Error: 0.9355 (Model Error Variance)

3.2.1.3 Reliability of Item Separation

Calculated using a ratio of true (adjusted) variance to observed variance for Items: \[Rel_{I}=\left(SA_{I}^{2}\right) /\left(SD_{I}^{2}\right)\] Where: SA2I : Adjusted item variability; Calculated by subtracting error variance for Item from total Item variance: \[ SA_{I}^{2} = SD_{I}^{2} - SE_{I}^{2}\] SD2I : Total Item variance

3.2.2 Item Information Curve (IIC)

Many IRT analyses also look at item information as evidence for precision. This is a statistical summary of the variance of item responses about a certain range on the latent variable. We can use this information to find out if and where items are providing information about person locations. IIC is not a major component of Rasch analyses, because the information is the same for all items in the dichotomous model (same shape). This is because the item slope parameter is fixed to 1 for all items, so all of the items discriminate among students the same way.

# Use "plotINFO" function for visualizing the item information
plotINFO(Di_Rasch_model)

3.2.3 Summaries of residuals: Infit & Outfit

The most popular Rasch fit statistics for practical purposes are based on summed squared residuals. There are two major categories of residual summary statistics: Unweighted (Outfit) and Weighted (Infit) mean square error (MSE) statistics. Unstandardized (χ2) & standardized versions (Z) are available in most Rasch software programs. In this analysis, we will use the Unstandardized (χ2) version.

3.2.3.1 Outfit Mean Square Error (MSE)

Outfit is the “Unweighted fit” statistic. For items, it is the sum of squared residuals for an item divided by the number of persons who responded to the item. For persons, it is sum of squared residuals for a person divided by the number of items encountered by the person.

The outfit is sensitive to extreme departures from model expectations. Examples: A high-achieving student provides an incorrect response to a very easy item; A low-achieving student provides a correct response to a very difficult item.

3.2.3.2 Infit Mean Square Error (MSE)

Infit is “Information-weighted fit”, where “information” means variance, such as larger variance for well-targeted observations, or smaller variance for extreme observations.

For items, it is the sum of squared standardized item residuals, weighted by variance, divided by the number of persons who responded to the item. For persons, it is the sum of squared standardized person residuals, weighted by variance, divided by the number of items the person encountered.

Infit is sensitive to less-extreme unexpected responses compared to outfit. Examples of less-extreme unexpected responses include: A person provides an incorrect response to an item that is just below their achievement level, or a person provides a correct response to an item that is just above their achievement level.

3.2.3.3 Expected values for MSE Fit Statistics

Note that there is much disagreement among measurement scholars about how to classify an infit our outfit statistic as “fitting” or “misfitting.” We will talk about this in class.

However, you should be aware of commonly accepted rule-of-thumb values among Rasch researchers:

  • Expected value is about 1.00 when data fit the model

  • Less than 1.00: Responses are too predictable; they resemble a Guttman-like (deterministic) pattern (“muted”)

  • More than 1.00: Responses are too haphazard (“noisy”); there is too much variation to suggest that the estimate is a good representation of the response pattern

  • Some variation is expected, but noisy responses are usually more cause for concern than muted responses

Frequently Used Critical Values for Mean Square Fit Statistics (Bond & Fox, p. 273, Table 12.7)

Type of Instrument “Acceptable Range”
Multiple-choice test (high-stakes) 0.80 – 1.20
Multiple-choice test (not high-stakes) 0.70 – 1.30
Rating scale 0.60 – 1.40
Clinical observation 0.50 – 1.70
Judgment (when agreement is encouraged) 0.40 – 1.20

*Note: These critical values are a very contentious topic in Rasch measurement!!!

3.2.3.4 Calculate Infit & Outfit for the transitive reasoning data

# Calculate the Item fit statistics using "itemfit" function on your person parameter object
Di_itemfit <- itemfit(person_pa)
Di_itemfit
## 
## Itemfit Statistics: 
##           Chisq  df p-value Outfit MSQ Infit MSQ Outfit t Infit t Discrim
## task_01 245.550 373   1.000      0.657     0.761   -1.327  -1.622   0.455
## task_02 421.904 373   0.041      1.128     1.087    1.071   1.187   0.054
## task_03 218.155 373   1.000      0.583     0.745   -2.742  -2.688   0.583
## task_04 478.894 373   0.000      1.280     1.158    2.442   2.268  -0.014
## task_05 346.015 373   0.839      0.925     0.991   -0.625  -0.109   0.184
## task_06  65.613 373   1.000      0.175     0.611   -2.789  -1.781   0.620
## task_07 244.494 373   1.000      0.654     0.804   -2.768  -2.475   0.493
## task_08 125.478 373   1.000      0.336     0.687   -2.205  -1.569   0.563
## task_09 394.207 373   0.216      1.054     0.904    0.462  -1.727   0.022
## task_10 326.534 373   0.960      0.873     0.898   -1.822  -2.408   0.226

This table will give us information about the infit and outfit statistics for each item. Please review our lecture materials for details about the interpretation of these values, noting that we generally expect these statistics to be around 1.00.

# Calculate the Person fit statistics
person_fit <- personfit(person_pa)
# Since person_fit is a long list, we can summarize it to get the aggregated value.
summary(person_fit$p.infitMSQ)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4060  0.4910  0.7487  0.9102  1.1740  2.2483
summary(person_fit$p.outfitMSQ)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1723  0.2399  0.5468  0.7665  0.9592  7.3060
  • We can see that there is some variability in person fit, with infit MSE statistics ranging from 0.41 to 2.25, and outfit MSE statistics ranging from 0.17 to 7.30:

  • From this fit analysis, we can see that the mean of the infit MSE and outfit MSE statistics are close to 1.0.

# Also, you can use the "personMisfit" function to find the person who misfit 
misfit_person <- PersonMisfit(person_pa) 
# This function counts the number of persons who do not fit the Rasch model. More specifically, it returns the proportion and frequency of persons - or more generally cases - who exceed a Chi-square based Z-value of 1.96 (suggesting a statistically significant deviation from the predicted response pattern).
misfit_person
## 
## Percentage of Misfitting Persons: 1.6043 %
# About 1.6043% persons are misfitting
misfit_person$count.misfit.Z
## [1] 6
# The detailed number for misfitting is 6
misfit_person$total.persons
## [1] 374
# This is the number of persons for whom a fit value was estimated

3.2.4 Item/Person Map

The “eRm” package provides plotting function to show the location of item/person on both logit scale and their t stastistics.

plotPWmap(Di_Rasch_model,imap=TRUE) # You can plot the Item Map

plotPWmap(Di_Rasch_model,pmap=TRUE) # You can even put the person and item inside one map

Also, we can construct a plot that shows item and person locations in the same graphical display (a Person-Item Map).

#To do this, use the following code:
plotPImap(Di_Rasch_model, sorted = TRUE)

In this plot, we should consider the degree to which there is evidence of overlap between item and person locations (targeting).

We can also examine the individual items’ ordering on the logit scale with reference to our theory about the expected ordering.

3.2.5 Item Characteristic Curves (ICC)

The IRFs/ICCs that we have been looking at are based on model-expected response probabilities.

plotICC(Di_Rasch_model,mplot=TRUE,ask = FALSE)

Note that the R package did not plot the observed probability.

3.2.6 Plots of standardized residuals

Let’s use some graphical displays to examine item fit in more detail. Please review our lecture materials for details about these displays. These plots show the standardized residual for the difference between the observed and expected response for each person on the item of interest.

# Collect the residual values from the Itemfit results
stresid <- Di_itemfit$st.res
# before constructing the plots, find the max & min residuals:
max.resid <- ceiling(max(stresid))
min.resid <- ceiling(min(stresid))
# The code below will produce standardized residual plots for each of the items in our example dataset in the “Plots” window on the bottom right of your R Studio window:
for(item.number in 1:ncol(stresid)){
  
  plot(stresid[, item.number], ylim = c(min.resid, max.resid),
       main = paste("Standardized Residuals for Item ", item.number, sep = ""),
       ylab = "Standardized Residual", xlab = "Person Index")
  abline(h = 0, col = "blue")
  abline(h=2, lty = 2, col = "red")
  abline(h=-2, lty = 2, col = "red")
  
  legend("topright", c("Std. Residual", "Observed = Expected", "+/- 2 SD"), pch = c(1, NA, NA), 
         lty = c(NA, 1, 2),
         col = c("black", "blue", "red"), cex = .8)
  
}

Then, we are going to plot the item characteristic curves (item response functions; IRFs) with the observed responses overlaid on them. These are sometimes called empirical IRFs.

for(item.number in 1:ncol(stresid)){
  plotICC(Di_Rasch_model, item.subset = item.number, empICC = list("raw"), empCI = list())
}