# Chapter 4 Multiple Correspondence Analysis

Data table: MCA is used to analyze one table when data is a combination of qualitative and quantitative variables, for instance to analyze the relationship between several categorical variables in a data table.

Goal: Helps to identify the observations which have a similar profile and also identifies assosiations between the variable categories, i.e assosiations within the levels.

Key ideas

1. MCA is technically a CA done on indicator matrix of a data table. Even quantitative variables can be analyzed by binning them, once binned these are again converted to binary values using disjunctive coding(one hot encoding).

2. MCA can also be seen as a generalization of principal component analysis when the variables to be analyzed are categorical instead of quantitative.

Interpretation

``````    1. One variable represents as many points as levels.

2. Levels   of variables close to each other are chosen together.

3. Variance of the levels   of a variable = importance of   the variable.

4. Row Factor scores are the coordinates of the row observations.
They are interpreted by the distances between them, and heir
distance from the origin. ``````

## 4.1 Dataset : Survey of autobiographical memory

Note: Dataset similar to PCA

Participants were asked to rate the extent to which a particular item applied to their memory in general, using a 5-point Likert scale

There are 216 obseravtions(rows) which represents the participants who answer to 26(Columns) questions that comprised of 8 Episodic memory based questions 6 Semantic memory questions, 6 Spatial memory based questions and 6 Prospective memory related questions.

The subjects include both men and women with their ages in the range of 18-84 years which are also mentioned as age and sex variable. A survey based measure of AM is also used to caatogorize the participants into two groups- High memory, Normal Memory, Low Memory

## 4.2 Looking data Pattern

### 4.2.1 Histograms

Plots the distribution of data across the variables. SAM questionaire has differen question types assesing different types of memory ( Episodic, Semantic, Spatial ,Future). We look at the histogram for each type of question.

NOTE: For SAM dataset, decided to keep all the five levels because conceptually it did not make sense to bin the neutral responses with either agree or disagree.

``````E<-d_MCA[ ,c(1:8)]
S<-d_MCA[ ,c(9:14)]
P<-d_MCA[ ,c(15:20)]
Fu<-d_MCA[ ,c(21:26)]

list <-lapply(1:ncol(E),
function(col) ggplot2::qplot(E[[col]],xlab = "scale", ylab =" freq",
main = names(E[col]),
geom = "histogram", fill=I("#5f9ea0"),
col=I("black"), binwidth = 1))

cowplot::plot_grid(plotlist = list)`````` Similarly the plotting histograms for Semantic, Spatial and Future memory.   Once you look into the data pattern in the historgram, you can decide how to go about binning them.

NOTE: For SAM dataset, decided to keep all the five levels because conceptually it did not make sense to bin the neutral responses with either agree or disagree.

Conversion to Disjunctive matrix

``````##   E1.1 E1.2 E1.3 E1.4 E1.5 E2.1 E2.2 E2.3 E2.4 E2.5 E3.1 E3.2 E3.3 E3.4
## 1    0    0    0    0    1    0    0    0    0    1    0    0    0    0
## 2    0    0    0    1    0    0    0    0    1    0    0    0    0    1
##   E3.5 E4.1 E4.2 E4.3 E4.4 E4.5 E5.1 E5.2 E5.3 E5.4 E5.5 E6.1 E6.2 E6.3
## 1    1    0    0    0    0    1    0    0    0    0    1    0    0    0
## 2    0    0    0    1    0    0    0    0    0    1    0    0    0    0
##   E6.4 E6.5 E7.1 E7.2 E7.3 E7.4 E7.5 E8.1 E8.2 E8.3 E8.4 E8.5 F1.1 F1.2
## 1    0    1    0    0    0    0    1    0    0    0    0    1    0    0
## 2    1    0    0    0    0    1    0    0    0    1    0    0    0    0
##   F1.3 F1.4 F1.5 F2.1 F2.2 F2.3 F2.4 F2.5 F3.1 F3.2 F3.3 F3.4 F3.5 F4.1
## 1    0    1    0    0    0    1    0    0    0    0    1    0    0    0
## 2    0    1    0    0    0    0    1    0    0    0    0    1    0    0
##   F4.2 F4.3 F4.4 F4.5 F5.1 F5.2 F5.3 F5.4 F5.5 F6.1 F6.2 F6.3 F6.4 F6.5
## 1    0    0    1    0    0    0    0    1    0    0    0    0    0    1
## 2    0    0    1    0    0    0    0    1    0    0    0    0    1    0
##   P1.1 P1.2 P1.3 P1.4 P1.5 P2.1 P2.2 P2.3 P2.4 P2.5 P3.1 P3.2 P3.3 P3.4
## 1    0    0    0    1    0    0    0    0    1    0    0    1    0    0
## 2    0    0    0    1    0    0    0    0    0    1    0    0    1    0
##   P3.5 P4.1 P4.2 P4.3 P4.4 P4.5 P5.1 P5.2 P5.3 P5.4 P5.5 P6.1 P6.2 P6.3
## 1    0    0    0    0    0    1    0    0    1    0    0    0    0    0
## 2    0    0    0    0    0    1    0    0    0    1    0    0    0    0
##   P6.4 P6.5 S1.1 S1.2 S1.3 S1.4 S1.5 S2.1 S2.2 S2.3 S2.4 S2.5 S3.1 S3.2
## 1    0    1    0    0    0    0    1    0    0    0    0    1    0    0
## 2    1    0    0    0    0    1    0    0    0    0    1    0    0    0
##   S3.3 S3.4 S3.5 S4.1 S4.2 S4.3 S4.4 S4.5 S5.1 S5.2 S5.3 S5.4 S5.5 S6.1
## 1    0    1    0    0    0    0    0    1    0    0    0    0    1    0
## 2    0    1    0    0    0    0    1    0    0    0    0    1    0    0
##   S6.2 S6.3 S6.4 S6.5
## 1    0    0    0    1
## 2    0    1    0    0``````

### 4.2.2 Correlation Plot

Burt Matrix : the matrix of all two-way cross-tabulations of the categorical variables. It is the inner product of design indicator matrix.

Note : MCA is applied to the Burt matrix ## 4.3 MCA Analysis

``````resMCA <- epMCA(DATA = d_Num.dis,
make_data_nominal = FALSE,
DESIGN = d_use\$memoryGroups,
graphs = FALSE # TRUE first pass only
)

# looking at contributions
ctrK <- ctr4Variables(resMCA\$ExPosition.Data\$cj)
rownames(col4Var) <- rownames(ctrK)

# running inference for MCA
resMCA.inf <- InPosition::epMCA.inference.battery(DATA = d_Num.dis,
DESIGN = d_use\$memoryGroups,
make_data_nominal = FALSE,
graphs =  FALSE)``````
``````##  "It is estimated that your iterations will take 0.08 minutes."
##  "R is not in interactive() mode. Resample-based tests will be conducted. Please take note of the progress bar."
## ===========================================================================``````

### 4.3.1 Scree Plot

Note : MCA – overestimates variance explained – underestimates eigenvalues

The so-called ‘percentage of inertia problem’ can be improved by using adjusted inertias procedure or eigenvalue correction.

For MCA: Stick with the kaiser plot and look into first 3 dimensions

``````# Scree ----
screePlot <- PlotScree(ev = resMCA\$ExPosition.Data\$eigs,
p.ev= resMCA.inf\$Inference.Data\$components\$p.vals,
plotKaiser = TRUE,
)`````` ### 4.3.2 Factor Scores of the rows

#### 4.3.2.1 Looking at dimension 1 and 2

MCA analysis is able to capture the non linear relationship between the participants due to the non linear relationship between the levels of the variables, despite the original relationship being more linear ( from Low memory to Normal memory to High memory )

The effect called a guttman effect Plotting Confidence Intervals

``````# Depend on the size of your data, this might take a while
fi.boot <- Boot4Mean(resMCA.inf\$Fixed.Data\$ExPosition.Data\$fi,
design = d_use\$memoryGroups,
niter = 1000)

bootCI4mean <- MakeCIEllipses(fi.boot\$BootCube[,c(1:2),], # get the first two components
col = col4Means)

# Check other parameters you can change for this function
bootCI4mean <- MakeCIEllipses(fi.boot\$BootCube[,c(1:2),], # get the first two components
col = col4Means)

fi.WithMeanCI <- Sam.Imap\$zeMap + label4Map +
fi.mean.plot\$zeMap_dots +
fi.mean.plot\$zeMap_text +
bootCI4mean

a003.Map.I <- fi.WithMeanCI

a003.Map.I`````` #### 4.3.2.2 Looking at dimension 2 and 3

Dimension 2 separates the Normal memory group from the Low memory group.

``````Sam.Imap23<- PTCA4CATA::createFactorMap(title = 'MCA: SAM Data Set',
axis1 = 2,
axis2 = 3,
resMCA\$ExPosition.Data\$fi,
col.points = col4row,
display.labels = FALSE,
alpha.points = .5
)

fi.mean.plot23 <- createFactorMap(group.mean,
axis1 = 2,
axis2 = 3,
alpha.points = 1,
display.labels = TRUE,
col.points = col4Means,
col.labels = col4Means,
pch = 17,
cex = 3,
text.cex = 3
)

# make labels for dimension 2 and 3 ----------------------------------------------
label4Map23 <- createxyLabels.gen(2,3,
lambda = resMCA\$ExPosition.Data\$eigs,
tau = resMCA\$ExPosition.Data\$t)

# generating the maps-------------------------------------------------------------

a002.Map.I23 <- Sam.Imap23\$zeMap + label4Map23 +
fi.mean.plot23\$zeMap_dots +
fi.mean.plot23\$zeMap_text
a002.Map.I23`````` Plotting Confidence Intervals

``````# Depend on the size of your data, this might take a while
fi.boot <- Boot4Mean(resMCA.inf\$Fixed.Data\$ExPosition.Data\$fi,
design = d_use\$memoryGroups,
niter = 1000)

axis1=2
axis2=3
bootCI4mean23 <- MakeCIEllipses(fi.boot\$BootCube[,c(2:3),], # get the first two components
names.of.factors = paste0("Dimension ", c(2,3)),
col = col4Means,)

fi.WithMeanCI23 <- Sam.Imap23\$zeMap + label4Map23 +
fi.mean.plot23\$zeMap_dots +
fi.mean.plot23\$zeMap_text +
bootCI4mean23

fi.WithMeanCI23`````` ### 4.3.3 Column Factor scores

#### 4.3.3.1 Variable Contributions map #### Important Contributions Only

``````# Variable contribution plot with important variables only Dim 1 and 2
var12 <- data4PCCAR::getImportantCtr(ctr = ctrK,
eig = resMCA\$ExPosition.Data\$eigs,
axis1 = 1,
axis2 = 2
)
importantVar <- var12\$importantCtr.1or2
col4ImportantVar <- col4Var
col4NS <- 'gray90'
col4ImportantVar[!importantVar] <- col4NS
ctrV12.imp <- PTCA4CATA::createFactorMap(X = ctrK,
title = "Important Variables: Contributions",
col.points = col4ImportantVar,
col.labels = col4ImportantVar,
alpha.points = 0.5,
cex = 2.5,
alpha.labels = 1,
text.cex = 4,
font.face = "plain",
font.family = "sans")
a0008.Var.ctr12.imp <- ctrV12.imp\$zeMap + ctr.labels
a0008.Var.ctr12.imp`````` ``````#Variable contribution map with Dimensions 2 & 3

var32 <- data4PCCAR::getImportantCtr(ctr = ctrK,
eig = resMCA\$ExPosition.Data\$eigs,
axis1 = 2,
axis2 = 3
)
importantVar32 <- var32\$importantCtr.1or2
col4ImportantVar32 <- col4Var
col4NS <- 'gray90'
col4ImportantVar32[!importantVar32] <- col4NS

ctrV32.imp <- PTCA4CATA::createFactorMap(X = ctrK,
axis1 = 2, axis2 = 3,
title = "Important Variables: Contributions 2 * 3",
col.points = col4ImportantVar32,
col.labels = col4ImportantVar32,
alpha.points = 0.5,
cex = 2.5,
alpha.labels = 1,
text.cex = 4,
font.face = "plain",
font.family = "sans")
ctr.labels32 <- createxyLabels.gen(
2,3, lambda = resMCA\$ExPosition.Data\$eigs,
tau = resMCA\$ExPosition.Data\$t
)
a0009.Var.ctr32.imp <- ctrV32.imp\$zeMap + ctr.labels32
a0009.Var.ctr32.imp`````` Dim 1 and 2 both seem to be explained by Episodic and Future memory factor scores.

#### 4.3.3.2 Factor scores with levels of important Variables  ### 4.3.4 Contrubution Bar plots

Dim 1: Explained by Episodic Memory and Future Memory

Dim 2: Explained mostly by Episodic and Future

``````varCtr1 <- ctrK[,1]
names(varCtr1) <- rownames(ctrK)
a0005.Var.ctr1 <- PrettyBarPlot2(varCtr1,
main = 'Variable Contributions: Dimension 1',
ylim = c(-.05, 1.2*max(varCtr1)),
font.size = 5,
threshold = 1 / nrow(ctrK),
color4bar = gplots::col2hex(col4Var)
)

a0005.Var.ctr1`````` ``````## contribution for dimension 2

varCtr2 <- ctrK[,2]
names(varCtr2) <- rownames(ctrK)
a0005.Var.ctr2 <- PrettyBarPlot2(varCtr2,
main = 'Variable Contributions: Dimension 2',
ylim = c(-.05, 1.2*max(varCtr2)),
font.size = 5,
threshold = 1 / nrow(ctrK),
color4bar = gplots::col2hex(col4Var)
)

a0005.Var.ctr2 `````` ``````##contribution for dimension 3
varCtr3 <- ctrK[,3]
names(varCtr3) <- rownames(ctrK)
a0005.Var.ctr3 <- PrettyBarPlot2(varCtr3,
main = 'Variable Contributions: Dimension 3',
ylim = c(-.05, 1.2*max(varCtr3)),
font.size = 5,
threshold = 1 / nrow(ctrK),
color4bar = gplots::col2hex(col4Var)
)
a0005.Var.ctr3`````` #### 4.3.4.1 Bootstrap Ratio Barplots

These bootstrap ratios are displayed with the levels. Observe how Levels 1 and 5 of episodic and Fiture memory contributions are significantly higher than the neutral level like 3.

``````## dimension 1
c0001.Levels.BR <- PrettyBarPlot2(
resMCA.inf\$Inference.Data\$fj.boots\$tests\$boot.ratios[,1], # BR
main = 'Bootstrap Ratios for Columns : Dimension 1',
threshold = 2,
color4bar = gplots::col2hex(col4Labels)
)

c0001.Levels.BR #<- recordPlot()`````` ``````## dimension 2

c0002.Levels.BR <- PrettyBarPlot2(
resMCA.inf\$Inference.Data\$fj.boots\$tests\$boot.ratios[,2], # BR
main = 'Bootstrap Ratios for Columns : Dimension 2',
threshold = 2,
color4bar = gplots::col2hex(col4Labels)
)

c0002.Levels.BR #<- recordPlot()`````` ``````c0003.Levels.BR <- PrettyBarPlot2(
resMCA.inf\$Inference.Data\$fj.boots\$tests\$boot.ratios[,3], # BR
main = 'Bootstrap Ratios for Columns : Dimension 3',
threshold = 2,
color4bar = gplots::col2hex(col4Labels)
)

c0003.Levels.BR`````` ## 4.4 Conclusion

Dimension 1 and 2

Row: Non linear / parabolic curve from Autobiographical Memory to High
Autobiographical Memory.

Col: Questions related to Episodic memory and questions related to memory for future are significantly contributing (they lie diagonal to both the dimension- contribute almost equally).

There is a non linear relationship within the levels of Episodic memory and Future memory groups.

Dimension 2 and 3

Row: Separates Normal Memory group from the extreme memory groups i.e. both low and high memory groups

Col: Questions related to memory for Future, Spatial memory seem to be significantly contributing towards dimension 3.