Week 7 Decision Table
In this chapter, students will learn to create a \(2 \times 2\) decision table or a contingency table based on a test score and a criterion score. From the contingency table, students will be able to understand and obtain the hit rate, sensitivity, specificity, and base rate in R.
Test scores are often used for making screening decisions. For example, a company can decide whether to hire an applicant or not based on the job screening test score \(X\) and a cut score. In this case, the test score \(X\) is used to predict a future criterion score \(Y\) such as the productivity of workers.
A future criterion score \(Y\) predicted by \(X\) can be dichotomized into an observed criterion outcome with two categories, successful/unsuccessful. For example, we can classify workers who produce more than 18 units/hour as successful.
Based on the screening decision (e.g., hire or do not hire) and the observed criterion outcome (e.g., successful or unsuccessful), we can construct the \(2 \times 2\) decision table.
7.1 Decision Table
Let’s construct a \(2 \times 2\) decision table using the same two data sets. First, import the two data sets and obtain the total scores.
Again, \(X\) is the test score used to predict the future criterion score \(Y\). Let’s assume that we predicted examinees with \(X\) greater than or equal to 13 to be successful in the future. \(Y\) is the actual outcome score and we can categorize examinees who got \(Y\) greater than or equal to 13 as successful.
- Note that the cut scores for the two tests are both 13 in this example. However, the cut off scores do not need to be always the same for the two test scores.
We can create logical vectors predicted
and actual
from \(X\) and \(Y\) using the cut off score 13.
## [1] FALSE TRUE FALSE FALSE
## [5] FALSE FALSE
## [1] FALSE FALSE TRUE FALSE
## [5] FALSE FALSE
The object predicted
is a logical vector indicating whether each examinee is predicted to be successful in the future. And the object actual
is a logical vector indicating whether each examinee is actually successful or not.
Using sum()
function to a logical value returns the number of TRUE
values in the vector. The following code returns the number of examinees that are predicted to be successful.
## [1] 44
And the following returns the number of examinees that are actually successful.
## [1] 33
Using mean()
function to a logical value returns the proportion of TRUE
s. Therfore, the proportion of predicted success and actual success can be obtained by:
## [1] 0.44
## [1] 0.33
We can combine the two vectors side to side to check if the predicted outcome matches with the actual outcome.
## predicted actual
## [1,] FALSE FALSE
## [2,] TRUE FALSE
## [3,] FALSE TRUE
## [4,] FALSE FALSE
## [5,] FALSE FALSE
## [6,] FALSE FALSE
## [7,] TRUE TRUE
## [8,] FALSE TRUE
## [9,] FALSE FALSE
## [10,] FALSE FALSE
The first examinee was predicted to be unsuccessful, and the true outcome was also unsuccessful.
The second examinee was predicted to be successful, however turned out to be an unsuccessful one.
The 8th examinee was predicted to be unsuccessful, but the actual outcome was successful.
Now, we can construct a \(2 \times 2\) decision table using table()
function.
## predicted
## actual FALSE TRUE
## FALSE 47 20
## TRUE 9 24
There are four combinations:
## predicted
## actual FALSE
## FALSE Correct rejection
## TRUE Miss
## predicted
## actual TRUE
## FALSE False alarm
## TRUE Hit
Hit: 24 examinees were predicted to be successful, and the true outcome was also successful.
Correct rejection: 47 examinees were predicted to be unsuccessful, and the true outcome was also unsuccessful.
False alarm: 20 examinees were predicted to be successful, but the true outcome was unsuccessful.
Miss: 9 examinees were predicted to be unsuccessful, but the true outcome was successful.
7.2 Hit Rate
The hit rate is the proportion of correct decisions. In other words, the hit rate is the proportion of predicting the successful outcomes as successful, and the unsuccessful outcomes as unsuccessful.
If our prediction or decision is correct, then the predicted outcome and the actual outcome should be equal. We can obtain the proportion of equals between the two vectors, predicted
and actual
.
## [1] 0.71
The hit rate is 0.71.
Equivalently, hit rate is the proportion of Hit
+ Corect rejection
from the decision table. The decision table is a \(2 \times 2\) matrix, and we can subset each element of the matrix with []
. The following code chunk calculates the hit rate.
## [1] 47
## [1] 24
## [1] 100
## [1] 0.71
7.3 Sensitivity and Specificity
The sensitivity is the proportion of true successful outcomes correctly identified. And the specificity is the proportion of true unsuccessful outcomes correctly identified. Therefore,
Sensitivity: \(\frac{\text{Hit}}{\text{Hit+Miss}}\)
Specificity: \(\frac{\text{Correct rejection}}{\text{False alarm + Correct rejection}}\)
Below code obtains the sensitivity and the specificity from the decision table.
## [1] 24
## [1] 9
## [1] 0.7272727
## [1] 47
## [1] 20
## [1] 0.7014925
The sensitivity is 0.7273 and the specificity is 0.7015.
We can use logical operators to calculate the sensitivity and the specificity.
## [1] 0.7272727
## [1] 0.7272727
## [1] 0.7014925