Week 4 CTT Item Analysis

In this chapter, students will learn to perform and write a function for CTT item analysis in R.

4.1 Response data

Let’s start with importing a data set using read.table function.

resp <- read.table('https://raw.githubusercontent.com/sunbeomk/PSYC490/main/resp.txt',
                   header = F,
                   sep = "\t")
  • The first argument is the file name with its location. If you have a .txt file in your local computer, then you can set your working directory to where your file exists, and import the data file.

  • The second argument header is a logical value indicating whether the file contains the names of the variables as its first line.

  • The argument sep shows how values on each line of the file are separated. For example, sep = "," means that the separator is a comma. sep = "\t" means that the separator is a tab.

The imported data set resp contains 100 subjects’ dichotomous responses to 40 GRE questions. Each row corresponds to subject, and each column corresponds to item. The response is coded as 1 if correct. Let’s check some properties of the imported data set.


4.2 CTT Item Analysis

4.2.1 Total score

The total score of each subject can be obtained by the row sums.

total_score <- rowSums(resp)

4.2.2 Item difficulty

The item difficulty in CTT can be obtained by calculating the proportion of correct answers of each item.

\[p_j = \frac{\sum_{i=1}^{n}X_{ij}}{n}\]

Since the correct answers are coded as 1, the column means will give us the proportion of correct, \(p\), which is the CTT item difficulty of the \(j\)-th item.

item_diff <- colMeans(resp)

4.2.3 Item discrimination

The item discrimination in CTT can be obtained by the point biserial correlation between the item response and the total score.

\[r_{j,pbis} = \left[ \frac{\bar{X}_1 - \bar{X}_0}{S_X} \right] \sqrt{\frac{n_1 n_0}{n(n-1)}}\]

When X is 0/1 and Y is continuous, \(r_{j, pbis}\) is equal to the Pearson correlation between X and Y. Let’s obtain the item discrimination of the resp data set.

n_items <- ncol(resp)        # number of items
total_score <- rowSums(resp) # total score

item_disc <- numeric(n_items)                  # output vector
for (j in 1:n_items) {                          # sequence
  item_disc[j] <- cor(total_score, resp[, j])    # body
  • First, we saved the number of items and the total score as objects n_items and total_score, respectively.

  • Before running the for loop, we created an output object item_disc which is a zero vector length of n_items.

  • Inside the for loop, we replace the \(j\)-th element of the output vector item_disc with the pearson correlation between the \(j\)-th item (resp[, j]) and the total_score. The \(j\)-th element of item_disc is the item discrimination of the \(j\)-th items.