Reliability

  • In this chapter, three methods for estimating internal reliability (\(\rho_{XX'}\)) are covered: coefficient \(\alpha\), KR20, and KR21.

  • The standard error of measurement estimates of a test can be obtained based on the reliability estimates.

  • In addition, the standard error of measurement estimates can be used for obtaining the confidence intervals for true scores of the subjects.

4.3 Coefficient \(\alpha\)

The coefficient \(\alpha\) provides the lower bound of the reliability of a test.

\[\rho_{XX'} \geq \alpha = \frac{N}{N-1} \left[{\frac{\sigma^2_X - \sum_{j=1}^N \sigma^2_{Y_j}}{\sigma^2_X}}\right]\] where,

  • N: the number of subtests (usually, items).

  • \(\sigma^2_X\): the total score variance of the whole test.

  • \(\sigma^2_{Y_j}\): the variance of the observed scores on the \(j\)-th subtest.

In practice, we often treat each item as a subtest, so \(N\) is the total number of items, and \(\sigma^2_{Y_j}\) are the score variances of the individual items.

Let’s create a function that calculates the coefficient \(\alpha\):

coeff_alpha <- function(responses) {
  # Get number of items (N) and individuals
  n_items <- ncol(responses)
  n_persons <- nrow(responses)
  # Get individual total scores
  x <- rowSums(responses)
  # Get observed-score variance of whole test (X)
  var_x <- var(x) * (n_persons - 1) / n_persons
  # Get observed-score variance of each item (Y_j)
  var_y <- numeric(n_items)
  for (j in 1:n_items) {
    var_y[j] <- var(responses[, j]) * (n_persons - 1) / n_persons
  }
  # Apply the alpha formula
  alpha <- (n_items / (n_items - 1)) * (1 - sum(var_y) / var_x)
  
  alpha
}

coeff_alpha(resp)
## [1] 0.7435825

The obtained coefficient \(\alpha\) from resp data set is 0.7436.

4.4 KR20

The Kuder-Richardson formula-20 (KR20) is a special case of the coefficient \(\alpha\). If each item is dichotomous, coefficient \(\alpha\) becomes the KR20.

\[KR20 = \frac{N}{N-1} \left[{\frac{\sigma^2_X - \sum_{j=1}^N p_j(1-p_j)}{\sigma^2_X}}\right]\]

where, \(p_j\) is the proportion of correct answers (\(Y_j = 1\)) for item \(j\).

  • Note: If \(Y_j\) is dichotomous, then \(\sigma^2_{Y_j} = p_j (1 - p_j)\).

The below function returns the KR20:

KR20 <- function(responses) {
  # Get number of items (N) and individuals
  n_items <- ncol(responses)
  n_persons <- nrow(responses)
  # get p_j for each item
  p <- colMeans(responses)
  # Get total scores (X)
  x <- rowSums(responses)
  # observed score variance
  var_x <- var(x) * (n_persons - 1) / n_persons
  # Apply KR-20 formula
  rel <- (n_items / (n_items - 1)) * (1 - sum(p * (1 - p)) / var_x)
  
  rel
}

KR20(resp)
## [1] 0.7435825

Note that the sample variances are calculated by \(S^2_X = \frac{\sum_{i = 1}^{N} (X_i - \bar{X})^2}{n}\), instead of \(S^2_X = \frac{\sum_{i = 1}^{N} (X_i - \bar{X})^2}{n - 1}\).

4.5 KR21

The Kuder-Richardson formula-21 (KR21) is given by:

\[KR21 = \frac{N}{N-1} \left[ {\frac{\sigma_X^2 - N\bar{p}(1-\bar{p})}{\sigma^2_X}} \right]\]

where, \(\bar{p}\) is the average of item difficulties.

In general, KR20 \(\geq\) KR21. When the item difficulties are all equal, then KR20 = KR21.

The function below returns the KR21:

KR21 <- function(responses) {
  # Get number of items (N) and individuals
  n <- ncol(responses)
  n_persons <- nrow(responses)
  # Get overall item difficulty  - pbar
  p_bar <- mean(colMeans(responses))
  # Observed score variance
  x <- rowSums(responses)
  var_x <- var(x) * (n_persons - 1) / n_persons
  # Apply KR-21 formula
  rel <- (n / (n-1)) * (1 - (n * p_bar * (1 - p_bar)) / var_x)
  
  rel
}
KR21(resp)
## [1] 0.721086

4.6 Standard error of measurement

Based on the reliability estimates, we can also estimate the standard error of measurement of the test, given by

\[\sigma_E = \sigma_X \sqrt{1 - \rho_{XX'}}\] where, \(\sigma_X\) is the standard deviation of the total score, and \(\rho_{XX'}\) is the reliability.

The below function returns the standard error of measurement estimate with the reliability estimate of the user’s choice.

se_m <- function(responses, rel_est = "alpha") {
  # The number of subjects
  n_persons <- nrow(responses)
  # Get the total score and its standard deviation
  total <- rowSums(responses)
  sd_x <- sqrt(var(total) * (n_persons - 1) / n_persons)
  
  if (rel_est == "alpha") {
    rel <- coeff_alpha(responses)
  } else if (rel_est == "KR20") {
    rel <- KR20(responses)
  } else if (rel_est == "KR21") {
    rel <- KR21(responses)
  } else {
    error('rel.est must be either alpha, KR20, or KR21')
  }
  
  se <- sd_x * sqrt(1 - rel)
  
  se
}

se_m(responses = resp, rel_est = "KR20")
## [1] 2.928571
se_m(responses = resp, rel_est = "KR21")
## [1] 3.054337

4.6.1 Confidence interval for true score

The standard error of measurement estimate can be used to construct the confidence interval of true score based on observed score as:

\[X \pm z_{\alpha / 2} \sigma_E = X \pm z_{\alpha / 2} \sigma_X \sqrt{1 - \rho_{XX'}}\]

4.7 Your turn

Suppose you want to obtain the 95% confidence intervals for true scores of the examinees using the reliability estimate based on coefficient \(\alpha\). Complete the below code.

sem_alpha <- 

total_score <- 

z <- 

ci_lower <- ci_upper <- numeric(100)

for (i in 1:100) {
  ci_lower[i]
  ci_upper[i]
}

hints:

  • Use se_m() function to obtain \(\sigma_E\).

  • Use rowSums() function to obtain total scores \(X\) of all examinees.

  • The z-score corresponding to the 95% confidence interval, \(z_{\alpha / 2}\), can be obtained by:

alpha <- 0.05
z <- qnorm(1 - (alpha / 2))
  • At each iteration in the for loop, replace the \(i\)-th elements of ci_lower and ci_upper by \(X_i - z_{\alpha / 2} \sigma_E\) and \(X_i + z_{\alpha / 2} \sigma_E\).