19.3 Qualitative prediction tasks

Examples:

  • Screening test: Healthy tissue or signs of cancer?
  • Will some market or individual stock go up or down today/this month/year?
  • A classic task: Who survived the Titanic disaster?

Strategy: Use 2x2 matrix as an analytic device. Beyond predicting category membership, we also want to evaluate the quality of the resulting prediction.

  • qualitatively: Predict membership to a binary category.
  • quantitatively: Describe result and measure success (as accuracy or some other metric).

Start with a dataset:

  1. Binary variables
# As contingency df:
t_df <- as.data.frame(Titanic)

# with(tt, table(Survived, Sex))  # only number of cases

t2 <- t_df %>%
  group_by(Sex, Survived) %>%
  summarise(n = n(),
            freq = sum(Freq))
t2
#> # A tibble: 4 x 4
#> # Groups:   Sex [2]
#>   Sex    Survived     n  freq
#>   <fct>  <fct>    <int> <dbl>
#> 1 Male   No           8  1364
#> 2 Male   Yes          8   367
#> 3 Female No           8   126
#> 4 Female Yes          8   344

# Frame a 2x2 matrix: ------ 

# (a) Pivot summary into 2x2 matrix: 
t2 %>% 
  pivot_wider(names_from = Sex, values_from = freq) %>%
  select(-n)
#> # A tibble: 2 x 3
#>   Survived  Male Female
#>   <fct>    <dbl>  <dbl>
#> 1 No        1364    126
#> 2 Yes        367    344

# (b) From contingency df: 
xtabs(Freq ~ Survived + Sex, data = t_df)
#>         Sex
#> Survived Male Female
#>      No  1364    126
#>      Yes  367    344

# (c) From raw data cases:
t_raw <- i2ds::expand_freq_table(t_df)
table(t_raw$Survived, t_raw$Sex)
#>      
#>       Male Female
#>   No  1364    126
#>   Yes  367    344

Note complexity of table interpretation, due to a difference between different measures and different perspectives that we can adopt on them. In terms of measures, we see a difference between frequency counts, proportions, and different kinds of probabilities:

  • Frequencies: Absolute numbers show many more males than females
  • Proportions of survivors by gender: Majority of females survived, majority of males died.
  • Probabilities can be joint, marginal, or conditional (depending on the computation of their numerator and denominator).

+++ here now +++

19.3.1 ToDo 1

Steps of the matrix lens model (see the MLM package):

  • Determine a pair of a predictor and a criterion variable.

  • Frame a 2x2 matrix:

Matrix transformations:

m_1
rowSums(m_1)
colSums(m_1)
sum(m_1)

# Get four basic values:
(abcd <- c(m_1[1, 1], m_1[1, 2], m_1[2, 1], m_1[2, 2]))

# Probabilities and marginal probabilities: 
prop.table(m_2) * 100
prop.table(m_2, margin = 1) * 100  # by rows
prop.table(m_2, margin = 2) * 100  # by cols
# ToDo: Diagnonal (margin = 3)

# Test:
chisq.test(m_1)
chisq.test(m_2)
chisq.test(m_3)

# Visualization:
mosaicplot(t(m_2), color = c("skyblue1", "grey75"))
mosaicplot(t(m_3), color = c("skyblue1", "grey75"))
  • Focusing: Compute various metrics

  • When predictor variable is continuous (and criterion is binary): Determine an optimal cut-off point to maximize some criterion.

19.3.2 Trees

library(FFTrees)
library(tidyverse)

t_df <- FFTrees::titanic
t_tb <- as_tibble(t_df)
t_tb

## variables as factors:
# t_tb$survived = factor(t_tb$survived, levels = c(1, 0))
# t_tb$sex = factor(t_tb$sex, levels = c("female", "male"))

t_tb

t4 <- t_tb %>%
  group_by(class, age, sex, survived) %>%
  count()
t4

t3 <- t_tb %>%
  group_by(sex, survived) %>%
  count()
t3

xtabs(cbind(survived, sex) ~ ., data = t_tb)

# Pivot into 2x2 matrix: 
t3 %>% 
  pivot_wider(names_from = sex, values_from = n)

Resources

Note existing resources for cross-tabulations:

References

Phillips, N. D., Neth, H., Woike, J. K., & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgment and Decision Making, 12(4), 344–368. http://journal.sjdm.org/17/17217/jdm17217.html