Week 7 Decision Table

In this chapter, students will learn to create a $2 \times 2$ decision table or a contingency table based on a test score and a criterion score. From the contingency table, students will be able to understand and obtain the hit rate, sensitivity, specificity, and base rate in R.

Test scores are often used for making screening decisions. For example, a company can decide whether to hire an applicant or not based on the job screening test score $X$ and a cut score. In this case, the test score $X$ is used to predict a future criterion score $Y$ such as the productivity of workers.

A future criterion score $Y$ predicted by $X$ can be dichotomized into an observed criterion outcome with two categories, successful/unsuccessful. For example, we can classify workers who produce more than 18 units/hour as successful.

Based on the screening decision (e.g., hire or do not hire) and the observed criterion outcome (e.g., successful or unsuccessful), we can construct the $2 \times 2$ decision table.

7.1 Decision Table

Let’s construct a $2 \times 2$ decision table using the same two data sets. First, import the two data sets and obtain the total scores.

# import data
test1 <- read.table("test1.txt")
test2 <- read.table("test2.txt")

# total scores
X <- rowSums(test1)
Y <- rowSums(test2)

Again, $X$ is the test score used to predict the future criterion score $Y$ . Let’s assume that we predicted examinees with $X$ greater than or equal to 13 to be successful in the future. $Y$ is the actual outcome score and we can categorize examinees who got $Y$ greater than or equal to 13 as successful.

Note that the cut scores for the two tests are both 13 in this example. However, the cut off scores do not need to be always the same for the two test scores.

We can create logical vectors predicted and actual from $X$ and $Y$ using the cut off score 13.

predicted <- (X >= 13)
actual <- (Y >= 13)
head(predicted)

## [1] FALSE  TRUE FALSE FALSE
## [5] FALSE FALSE

head(actual)

## [1] FALSE FALSE  TRUE FALSE
## [5] FALSE FALSE

The object predicted is a logical vector indicating whether each examinee is predicted to be successful in the future. And the object actual is a logical vector indicating whether each examinee is actually successful or not.

Using sum() function to a logical value returns the number of TRUE values in the vector. The following code returns the number of examinees that are predicted to be successful.

sum(predicted)

## [1] 44

And the following returns the number of examinees that are actually successful.

sum(actual)

## [1] 33

Using mean() function to a logical value returns the proportion of TRUEs. Therfore, the proportion of predicted success and actual success can be obtained by:

mean(predicted)

## [1] 0.44

mean(actual)

## [1] 0.33

We can combine the two vectors side to side to check if the predicted outcome matches with the actual outcome.

match <- cbind(predicted, actual)
head(match, 10)

##       predicted actual
##  [1,]     FALSE  FALSE
##  [2,]      TRUE  FALSE
##  [3,]     FALSE   TRUE
##  [4,]     FALSE  FALSE
##  [5,]     FALSE  FALSE
##  [6,]     FALSE  FALSE
##  [7,]      TRUE   TRUE
##  [8,]     FALSE   TRUE
##  [9,]     FALSE  FALSE
## [10,]     FALSE  FALSE

The first examinee was predicted to be unsuccessful, and the true outcome was also unsuccessful.
The second examinee was predicted to be successful, however turned out to be an unsuccessful one.
The 8th examinee was predicted to be unsuccessful, but the actual outcome was successful.

Now, we can construct a $2 \times 2$ decision table using table() function.

decision <- table(actual, predicted)
decision

##        predicted
## actual  FALSE TRUE
##   FALSE    47   20
##   TRUE      9   24

There are four combinations:

##        predicted
## actual  FALSE            
##   FALSE Correct rejection
##   TRUE  Miss             
##        predicted
## actual  TRUE       
##   FALSE False alarm
##   TRUE  Hit

Hit: 24 examinees were predicted to be successful, and the true outcome was also successful.
Correct rejection: 47 examinees were predicted to be unsuccessful, and the true outcome was also unsuccessful.
False alarm: 20 examinees were predicted to be successful, but the true outcome was unsuccessful.
Miss: 9 examinees were predicted to be unsuccessful, but the true outcome was successful.

7.2 Hit Rate

The hit rate is the proportion of correct decisions. In other words, the hit rate is the proportion of predicting the successful outcomes as successful, and the unsuccessful outcomes as unsuccessful.

If our prediction or decision is correct, then the predicted outcome and the actual outcome should be equal. We can obtain the proportion of equals between the two vectors, predicted and actual.

mean(predicted == actual)

## [1] 0.71

The hit rate is 0.71.

Equivalently, hit rate is the proportion of Hit + Corect rejection from the decision table. The decision table is a $2 \times 2$ matrix, and we can subset each element of the matrix with []. The following code chunk calculates the hit rate.

decision[1, 1] # Correct rejection

## [1] 47

decision[2, 2] # Hit

## [1] 24

sum(decision) # Number of examinees

## [1] 100

(decision[1, 1] + decision[2, 2]) / sum(decision) # hit rate

## [1] 0.71

7.3 Sensitivity and Specificity

The sensitivity is the proportion of true successful outcomes correctly identified. And the specificity is the proportion of true unsuccessful outcomes correctly identified. Therefore,

Sensitivity: $\frac{\text{Hit}}{\text{Hit+Miss}}$
Specificity: $\frac{\text{Correct rejection}}{\text{False alarm + Correct rejection}}$

Below code obtains the sensitivity and the specificity from the decision table.

decision[2, 2] # hit

## [1] 24

decision[2, 1] # miss

## [1] 9

decision[2, 2] / (decision[2, 2] + decision[2, 1]) # sensitivity

## [1] 0.7272727

decision[1, 1] # correct rejection

## [1] 47

decision[1, 2] # false alarm

## [1] 20

decision[1, 1] / (decision[1, 1] + decision[1, 2]) # specificity

## [1] 0.7014925

The sensitivity is 0.7273 and the specificity is 0.7015.

We can use logical operators to calculate the sensitivity and the specificity.

# sensitivity
mean(predicted[actual == TRUE])

## [1] 0.7272727

mean(predicted[actual])

## [1] 0.7272727

# mean(predicted[actual == TRUE] == TRUE)

# specificity
mean(predicted[actual == FALSE] == FALSE)

## [1] 0.7014925

7.4 Base Rate

The base rate is the proportion of actual successful outcomes. It is equivalent to the proportion of actual successful outcomes if we predict all the examinees to be successful. The below codes all return the base rate.

mean(actual)

## [1] 0.33

(decision[2, 1] + decision[2, 2]) / sum(decision)

## [1] 0.33

sum(decision[2, ]) / sum(decision)

## [1] 0.33

rowSums(decision)[2] / sum(decision)

## TRUE 
## 0.33