25.1 Inter-rate reliability methods
Calculation to judge the degree of agreement between the choices made by two or more independent judges
Other packages are
vcd
for visualizationDescTools
25.1.1 Percent Agreement
\[ \frac{\text{number of agreement}}{\text{number of total}} \times 100 \]
## Loading required package: lpSolve
## rater1 rater2 rater3
## 1 4. Neurosis 4. Neurosis 4. Neurosis
## 2 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 3 2. Personality Disorder 3. Schizophrenia 3. Schizophrenia
## 4 5. Other 5. Other 5. Other
## 5 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 6 1. Depression 1. Depression 3. Schizophrenia
## 7 3. Schizophrenia 3. Schizophrenia 3. Schizophrenia
## 8 1. Depression 1. Depression 3. Schizophrenia
## 9 1. Depression 1. Depression 4. Neurosis
## 10 5. Other 5. Other 5. Other
## rater4 rater5 rater6
## 1 4. Neurosis 4. Neurosis 4. Neurosis
## 2 5. Other 5. Other 5. Other
## 3 3. Schizophrenia 3. Schizophrenia 5. Other
## 4 5. Other 5. Other 5. Other
## 5 4. Neurosis 4. Neurosis 4. Neurosis
## 6 3. Schizophrenia 3. Schizophrenia 3. Schizophrenia
## 7 3. Schizophrenia 5. Other 5. Other
## 8 3. Schizophrenia 3. Schizophrenia 4. Neurosis
## 9 4. Neurosis 4. Neurosis 4. Neurosis
## 10 5. Other 5. Other 5. Other
## Percentage agreement (Tolerance=0)
##
## Subjects = 30
## Raters = 6
## %-agree = 16.7
25.1.2 Cohen’s Kappa
\[ k = \frac{p_o - p_e}{1 - p_e} = 1 - \frac{1 - p_o}{1- p_e} \]
where
\(p_o\) = relative observed agreement among raters
\(p_e\) = hypothetical probability of chance agreement
strict agreements between raters
appropriate in case of 2 ordinal or nominal varibles
Based on (Landis and Koch 1977)’s guide, we have
Degree | Decision |
---|---|
0.01 – 0.20 | slight agreement |
0.21 – 0.40 | fair agreement |
0.41 – 0.60 | moderate agreement |
0.61 – 0.80 | substantial agreement |
0.81 – 1.00 | almost perfect or perfect agreement |
# Unweighted kappa for 2 nominal or 2 ordinal categorical
kappa2(diagnoses[, c("rater1", "rater2")], weight = "unweighted") # two ordinal variables only, allows partial agreement
## Cohen's Kappa for 2 Raters (Weights: unweighted)
##
## Subjects = 30
## Raters = 2
## Kappa = 0.651
##
## z = 7
## p-value = 2.63e-12
# Weighted kappa ordinal scales
kappa2(diagnoses[, c("rater1", "rater2")], weight = "equal") # linear weightes of the differences
## Cohen's Kappa for 2 Raters (Weights: equal)
##
## Subjects = 30
## Raters = 2
## Kappa = 0.633
##
## z = 5.43
## p-value = 5.52e-08
kappa2(diagnoses[, c("rater1", "rater2")], weight = "squared") # squared weightes of the differences
## Cohen's Kappa for 2 Raters (Weights: squared)
##
## Subjects = 30
## Raters = 2
## Kappa = 0.655
##
## z = 3.91
## p-value = 9.37e-05
p-value less than 0.05, mean that raters agree more than what you would expect by chance.
References
Cohen, Jacob. 1960. “A Coefficient of Agreement for Nominal Scales.” Educational and Psychological Measurement 20 (1): 37–46. https://doi.org/10.1177/001316446002000104.
Landis, J. Richard, and Gary G. Koch. 1977. “The Measurement of Observer Agreement for Categorical Data.” Biometrics 33 (1): 159. https://doi.org/10.2307/2529310.