# Chapter 5 Discriminant Correspondence Analysis (DiCA)

DiCA is an extension of discriminant analysis (DA) and correspondence analysis (CA). Like discriminant analysis, the goal of DiCA is to categorize observations in pre-defined groups, and like correspondence analysis, it is used with nominal variables.

Data table: DiCA is used to analyse two table dataset

Goal:

• To represent each group by the sum of its observations.

• The original observations are then projected as supplementary elements and each observation is assigned to the closest group. The comparison between the original and the predicted groups is used to assess the quality of the discrimination. Note: A similar procedure can be used to assign new observations to categories.

• The reliability of the analysis is evaluated using cross-validation techniques such as jackknifing or bootstrapping.

Key ideas

1. Separate the groups first and then perform a MCA.
2. Dimensionality of the Space = No:of non zero eigenvalues = No:of groups - 1

Interpretation

``````1.  Row Factor scores are the coordinates of the row observations. They are
interpreted by the distances between them, and heir distance from the
origin.

2.  One variable represents as many points as levels (compare with PCA).
Levels  of variables close to each other are chosen together.
Variance    of the levels   of a variable = importance of   the variable. ``````

## 5.1 Dataset: Survey of Autobiographical Memory

Note: Dataset similar to PCA , MCA

Participants were asked to rate the extent to which a particular item applied to their memory in general, using a 5-point Likert scale

There are 216 obseravtions(rows) which represents the participants who answer to 26(Columns) questions that comprised of 8 Episodic memory based questions 6 Semantic memory questions, 6 Spatial memory based questions and 6 Prospective memory related questions.

The subjects include both men and women with their ages in the range of 18-84 years which are also mentioned as age and sex variable. A survey based measure of AM is also used to caatogorize the participants into two groups- High memory, Normal Memory, Low Memory

## 5.2 Looking at the Data Pattern

### 5.2.1 Histograms :

Looking at the distribution of frequency of the scores for each question (variable). The plot helps us decide on how to go about binning our data.

SAM questionaire has differen question types assesing different types of memory ( Episodic, Semantic, Spatial ,Future). We look at the histogram for each type of question.

Similar to MCA NOTE: For SAM dataset, I decided to keep all the five levels because conceptually it did not make sense to bin the neutral responses with either agree or disagree.

### 5.2.2 Correlation Plot

Burt Matrix : The matrix of all two-way cross-tabulations of the categorical variables.

DiCA also uses Burt Matrix, similar to MCA, since MCA is applied to the Burt matrix.

## 5.3 DiCA analysis

``````resDiCA <- tepDICA(DATA = d_DiCA.dis,
make_data_nominal = FALSE,
DESIGN = d_active\$memoryGroups,
graphs = FALSE)

resDiCA.inference <- tepDICA.inference.battery(DATA = d_DiCA.dis,
DESIGN = d_active\$memoryGroups,
make_data_nominal = FALSE,
graphs = FALSE )``````
``````## [1] "It is estimated that your iterations will take 0.12 minutes."
## [1] "R is not in interactive() mode. Resample-based tests will be conducted. Please take note of the progress bar."
## ===========================================================================``````

### 5.3.1 DiCA Scree Plot

Again, the total no: significant dimensions = No: groups - 1

``````scree.plot <- PlotScree(ev = resDiCA\$TExPosition.Data\$eigs,
p.ev= resDiCA.inference\$Inference.Data\$components\$p.vals,
plotKaiser = TRUE,
)``````

### Dica Row Factor Scores

#### 5.3.1.1 1. With Means

Group means for High Medium and Low Autobiographical memory is plotted.

#### 5.3.1.2 2. With Means and Confidence Interval

The means of the three memory groups are significanlty different from each other.

#### 3. With means and Tolerance interval

Although the means are separate, the spread of the row factor as per the memory gorups overlap a lot.

### 5.3.2 DiCA: Column Factor scores

#### 5.3.2.1 Column factor scores :Important Variables

``````# looking at contributions

ctrK <- ctr4Variables(resDiCA\$TExPosition.Data\$cj)
rownames(col4Var) <- rownames(ctrK)

col4Levels <- coloringLevels(rownames(resDiCA\$TExPosition.Data\$fj), col4Var)
col4Labels <- col4Levels\$color4Levels

#-------------------------------------------------------------------------
axis1 = 1
axis2 = 2
Fj <- resDiCA\$TExPosition.Data\$fj

BaseMap.Fj <- createFactorMap( Fj,
axis1 = axis1,
axis2 = axis2,
title = 'DiCA. Variables',
col.points = col4Levels\$color4Levels,
cex = 1,
col.labels = col4Levels\$color4Levels,
text.cex = 2.5,
force = 2)

# make the J-maps ----
b001.BaseMap.Fj <- BaseMap.Fj\$zeMap + label4Map
b002.BaseMapNoDot.Fj  <- BaseMap.Fj\$zeMap_background +
BaseMap.Fj\$zeMap_text + label4Map

ctrV12 <- PTCA4CATA::createFactorMap(X = ctrK,
title = "Variable Contributions",
col.points = col4Var,
col.labels = col4Var,
alpha.points = 0.5,
cex = 2.5,
alpha.labels = 1,
text.cex = 4,
font.face = "plain",
font.family = "sans")
ctr.labels <- createxyLabels.gen(
1,2, lambda = resDiCA\$TExPosition.Data\$eigs,
tau = resDiCA\$TExPosition.Data\$t
)
a0007.Var.ctr12 <- ctrV12\$zeMap + ctr.labels
a0007.Var.ctr12 ``````

``````#Variable contribution plot with important variables only Dim 1 and 2
var12 <- data4PCCAR::getImportantCtr(ctr = ctrK,
eig = resDiCA\$TExPosition.Data\$eigs,
axis1 = 1,
axis2 = 2
)
importantVar <- var12\$importantCtr.1or2
col4ImportantVar <- col4Var
col4NS <- 'gray90'
col4ImportantVar[!importantVar] <- col4NS
ctrV12.imp <- PTCA4CATA::createFactorMap(X = ctrK,
title = "Important Variables: Contributions",
col.points = col4ImportantVar,
col.labels = col4ImportantVar,
alpha.points = 0.5,
cex = 2.5,
alpha.labels = 1,
text.cex = 4,
font.face = "plain",
font.family = "sans")
a0008.Var.ctr12.imp <- ctrV12.imp\$zeMap + ctr.labels
a0008.Var.ctr12.imp``````

#### 5.3.2.2 Column Factor scores with levels

``````# Factor scores with levels of important Variables

col4Levels.imp <- data4PCCAR::coloringLevels(rownames(Fj),
col4ImportantVar)

BaseMap.Fj.imp <- createFactorMap(X = Fj , # resMCA\$ExPosition.Data\$fj,
axis1 = axis1, axis2 = axis2,
title = 'DiCA Important Variables',
col.points = col4Levels.imp\$color4Levels,
cex = 1,
col.labels = col4Levels.imp\$color4Levels,
text.cex = 2.5,
force = 2)

b0010.BaseMap.Fj <- BaseMap.Fj.imp\$zeMap + label4Map
b0010.BaseMap.Fj #<- recordPlot()``````

``````lines4J <- addLines4MCA(Fj,
col4Var = col4Levels.imp\$color4Variables,
size = .7)

b0020.BaseMap.Fj <- b0010.BaseMap.Fj + lines4J
b0020.BaseMap.Fj``````

``````zeNames <- getVarNames(rownames(Fj))
importantsLabels <- zeNames\$stripedNames %in% zeNames\$variableNames[importantVar]

Fj.imp <- Fj[importantsLabels,]

col4Var =
col4Levels\$color4Variables[which(importantVar)],
size = .9,
linetype = 3,
alpha = .5)

b0021.BaseMap.Fj <- b0010.BaseMap.Fj + lines4J.imp
b0021.BaseMap.Fj  ``````

### 5.3.3 Contribution Barplot

Dim 1: Epiosdic and Future are contributing signigicantly.

Dim 2: Epiosdic and Semantic are contributing signigicantly.

``````#dim 1

varCtr1 <- ctrK[,1]
names(varCtr1) <- rownames(ctrK)
a0005.Var.ctr1 <- PrettyBarPlot2(varCtr1,
main = 'Variable Contributions: Dimension 1',
ylim = c(-.05, 1.2*max(varCtr1)),
font.size = 5,
threshold = 1 / nrow(ctrK),
color4bar = gplots::col2hex(col4Var)
)

a0005.Var.ctr1``````

``````#dimension 2

varCtr2 <- ctrK[,2]
names(varCtr2) <- rownames(ctrK)
a0005.Var.ctr2 <- PrettyBarPlot2(varCtr2,
main = 'Variable Contributions: Dimension 2',
ylim = c(-.05, 1.2*max(varCtr2)),
font.size = 5,
threshold = 1 / nrow(ctrK),
color4bar = gplots::col2hex(col4Var)
)

a0005.Var.ctr2 ``````

### 5.3.4 Bootstrap Ratio Barplots

These bootstrap ratios are displayed with the levels. Observe how Levels 1 and 5 of episodic and Fiture memory contributions are significantly higher than the neutral level like 3.

``````## dimension 1
c0001.Levels.BR <- PrettyBarPlot2(resDiCA.inference\$Inference.Data\$boot.data\$fj.boot.data\$tests\$boot.ratios[,1],
main = 'Bootstrap Ratios for Columns : Dimension 1',
threshold = 2,
color4bar = gplots::col2hex(col4Labels)
)

c0001.Levels.BR #<- recordPlot()``````

``````## dimension 2

c0002.Levels.BR <- PrettyBarPlot2(resDiCA.inference\$Inference.Data\$boot.data\$fj.boot.data\$tests\$boot.ratios[,2],
main = 'Bootstrap Ratios for Columns : Dimension 2',
threshold = 2,
color4bar = gplots::col2hex(col4Labels)
)

c0002.Levels.BR #<- recordPlot()``````

## 5.4 Conclusion

Dim 1: Non linear relationship among the groups of participants from high memory to low memory Epiosdic and Future based questions express a non linear relationship which also follows a similar pattern from level 1 to level 5.

Dim 2: Separates Normal memory from extreme memory(Both high and low memory). Similar patterb separates the neutral responses to the questions( Likert scale 3 ) from extreme responses ( Likert scale 1 and 5)

DiCA also does a good job in predicting the groups in Fixed Effect as well as Random Effect when jacknife procedure is used. Even though there is a bias of same sample being used, the accurary for random effect is 71 % which hints towards the fact that it could be used for generalization.