Week 4 CTT Item Analysis

In this chapter, students will learn to perform and write a function for CTT item analysis in R.

4.1 Response data

Let’s start with importing a data set using read.table function.

resp <- read.table('https://raw.githubusercontent.com/sunbeomk/PSYC490/main/resp.txt',
                   header = F,
                   sep = "\t")

The first argument is the file name with its location. If you have a .txt file in your local computer, then you can set your working directory to where your file exists, and import the data file.
The second argument header is a logical value indicating whether the file contains the names of the variables as its first line.
The argument sep shows how values on each line of the file are separated. For example, sep = "," means that the separator is a comma. sep = "\t" means that the separator is a tab.

The imported data set resp contains 100 subjects’ dichotomous responses to 40 GRE questions. Each row corresponds to subject, and each column corresponds to item. The response is coded as 1 if correct. Let’s check some properties of the imported data set.

head(resp)
class(resp)
nrow(resp)
ncol(resp)
dim(resp)

4.2 CTT Item Analysis

4.2.1 Total score

The total score of each subject can be obtained by the row sums.

total_score <- rowSums(resp)

4.2.2 Item difficulty

The item difficulty in CTT can be obtained by calculating the proportion of correct answers of each item.

\[p_j = \frac{\sum_{i=1}^{n}X_{ij}}{n}\]

Since the correct answers are coded as 1, the column means will give us the proportion of correct, \(p\), which is the CTT item difficulty of the \(j\)-th item.

item_diff <- colMeans(resp)

4.2.3 Item discrimination

The item discrimination in CTT can be obtained by the point biserial correlation between the item response and the total score.

\[r_{j,pbis} = \left[ \frac{\bar{X}_1 - \bar{X}_0}{S_X} \right] \sqrt{\frac{n_1 n_0}{n(n-1)}}\]

When X is 0/1 and Y is continuous, \(r_{j, pbis}\) is equal to the Pearson correlation between X and Y. Let’s obtain the item discrimination of the resp data set.

n_items <- ncol(resp)        # number of items
total_score <- rowSums(resp) # total score

item_disc <- numeric(n_items)                  # output vector
for (j in 1:n_items) {                          # sequence
  item_disc[j] <- cor(total_score, resp[, j])    # body
}

First, we saved the number of items and the total score as objects n_items and total_score, respectively.
Before running the for loop, we created an output object item_disc which is a zero vector length of n_items.
Inside the for loop, we replace the \(j\)-th element of the output vector item_disc with the pearson correlation between the \(j\)-th item (resp[, j]) and the total_score. The \(j\)-th element of item_disc is the item discrimination of the \(j\)-th items.