Examples:

• Screening test: Healthy tissue or signs of cancer?
• Will some market or individual stock go up or down today/this month/year?
• A classic task: Who survived the Titanic disaster?

Strategy: Use 2x2 matrix as an analytic device. Beyond predicting category membership, we also want to evaluate the quality of the resulting prediction.

• qualitatively: Predict membership to a binary category.
• quantitatively: Describe result and measure success (as accuracy or some other metric).

1. Binary variables
# As contingency df:
t_df <- as.data.frame(Titanic)

# with(tt, table(Survived, Sex))  # only number of cases

t2 <- t_df %>%
group_by(Sex, Survived) %>%
summarise(n = n(),
freq = sum(Freq))
t2
#> # A tibble: 4 x 4
#> # Groups:   Sex 
#>   Sex    Survived     n  freq
#>   <fct>  <fct>    <int> <dbl>
#> 1 Male   No           8  1364
#> 2 Male   Yes          8   367
#> 3 Female No           8   126
#> 4 Female Yes          8   344

# Frame a 2x2 matrix: ------

# (a) Pivot summary into 2x2 matrix:
t2 %>%
pivot_wider(names_from = Sex, values_from = freq) %>%
select(-n)
#> # A tibble: 2 x 3
#>   Survived  Male Female
#>   <fct>    <dbl>  <dbl>
#> 1 No        1364    126
#> 2 Yes        367    344

# (b) From contingency df:
xtabs(Freq ~ Survived + Sex, data = t_df)
#>         Sex
#> Survived Male Female
#>      No  1364    126
#>      Yes  367    344

# (c) From raw data cases:
t_raw <- i2ds::expand_freq_table(t_df)
table(t_raw$Survived, t_raw$Sex)
#>
#>       Male Female
#>   No  1364    126
#>   Yes  367    344

Note complexity of table interpretation, due to a difference between different measures and different perspectives that we can adopt on them. In terms of measures, we see a difference between frequency counts, proportions, and different kinds of probabilities:

• Frequencies: Absolute numbers show many more males than females
• Proportions of survivors by gender: Majority of females survived, majority of males died.
• Probabilities can be joint, marginal, or conditional (depending on the computation of their numerator and denominator).

+++ here now +++

### 19.3.1 ToDo 1

Steps of the matrix lens model (see the MLM package):

• Determine a pair of a predictor and a criterion variable.

• Frame a 2x2 matrix:

Matrix transformations:

m_1
rowSums(m_1)
colSums(m_1)
sum(m_1)

# Get four basic values:
(abcd <- c(m_1[1, 1], m_1[1, 2], m_1[2, 1], m_1[2, 2]))

# Probabilities and marginal probabilities:
prop.table(m_2) * 100
prop.table(m_2, margin = 1) * 100  # by rows
prop.table(m_2, margin = 2) * 100  # by cols
# ToDo: Diagnonal (margin = 3)

# Test:
chisq.test(m_1)
chisq.test(m_2)
chisq.test(m_3)

# Visualization:
mosaicplot(t(m_2), color = c("skyblue1", "grey75"))
mosaicplot(t(m_3), color = c("skyblue1", "grey75"))
• Focusing: Compute various metrics

• When predictor variable is continuous (and criterion is binary): Determine an optimal cut-off point to maximize some criterion.

### 19.3.2 Trees

• Goal: Illustrate cases of binary prediction by the FFTrees package .
library(FFTrees)
library(tidyverse)

t_df <- FFTrees::titanic
t_tb <- as_tibble(t_df)
t_tb

## variables as factors:
# t_tb$survived = factor(t_tb$survived, levels = c(1, 0))
# t_tb$sex = factor(t_tb$sex, levels = c("female", "male"))

t_tb

t4 <- t_tb %>%
group_by(class, age, sex, survived) %>%
count()
t4

t3 <- t_tb %>%
group_by(sex, survived) %>%
count()
t3

xtabs(cbind(survived, sex) ~ ., data = t_tb)

# Pivot into 2x2 matrix:
t3 %>%
pivot_wider(names_from = sex, values_from = n)

#### Resources

Note existing resources for cross-tabulations:

### References

Phillips, N. D., Neth, H., Woike, J. K., & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgment and Decision Making, 12(4), 344–368. http://journal.sjdm.org/17/17217/jdm17217.html