Chapter 4 Multiple Correspondence Analysis

Data table: MCA is used to analyze one table when data is a combination of qualitative and quantitative variables, for instance to analyze the relationship between several categorical variables in a data table.

Goal: Helps to identify the observations which have a similar profile and also identifies assosiations between the variable categories, i.e assosiations within the levels.

Key ideas

  1. MCA is technically a CA done on indicator matrix of a data table. Even quantitative variables can be analyzed by binning them, once binned these are again converted to binary values using disjunctive coding(one hot encoding).

  2. MCA can also be seen as a generalization of principal component analysis when the variables to be analyzed are categorical instead of quantitative.

Interpretation

    1. One variable represents as many points as levels.    
    
    2. Levels   of variables close to each other are chosen together.   
    
    3. Variance of the levels   of a variable = importance of   the variable.   
    
    4. Row Factor scores are the coordinates of the row observations. 
       They are interpreted by the distances between them, and heir
       distance from the origin. 

4.1 Dataset : Survey of autobiographical memory

Note: Dataset similar to PCA

Participants were asked to rate the extent to which a particular item applied to their memory in general, using a 5-point Likert scale

There are 216 obseravtions(rows) which represents the participants who answer to 26(Columns) questions that comprised of 8 Episodic memory based questions 6 Semantic memory questions, 6 Spatial memory based questions and 6 Prospective memory related questions.

The subjects include both men and women with their ages in the range of 18-84 years which are also mentioned as age and sex variable. A survey based measure of AM is also used to caatogorize the participants into two groups- High memory, Normal Memory, Low Memory

4.2 Looking data Pattern

4.2.1 Histograms

Plots the distribution of data across the variables. SAM questionaire has differen question types assesing different types of memory ( Episodic, Semantic, Spatial ,Future). We look at the histogram for each type of question.

NOTE: For SAM dataset, decided to keep all the five levels because conceptually it did not make sense to bin the neutral responses with either agree or disagree.

Similarly the plotting histograms for Semantic, Spatial and Future memory.

Once you look into the data pattern in the historgram, you can decide how to go about binning them.

NOTE: For SAM dataset, decided to keep all the five levels because conceptually it did not make sense to bin the neutral responses with either agree or disagree.

Conversion to Disjunctive matrix

##   E1.1 E1.2 E1.3 E1.4 E1.5 E2.1 E2.2 E2.3 E2.4 E2.5 E3.1 E3.2 E3.3 E3.4
## 1    0    0    0    0    1    0    0    0    0    1    0    0    0    0
## 2    0    0    0    1    0    0    0    0    1    0    0    0    0    1
##   E3.5 E4.1 E4.2 E4.3 E4.4 E4.5 E5.1 E5.2 E5.3 E5.4 E5.5 E6.1 E6.2 E6.3
## 1    1    0    0    0    0    1    0    0    0    0    1    0    0    0
## 2    0    0    0    1    0    0    0    0    0    1    0    0    0    0
##   E6.4 E6.5 E7.1 E7.2 E7.3 E7.4 E7.5 E8.1 E8.2 E8.3 E8.4 E8.5 F1.1 F1.2
## 1    0    1    0    0    0    0    1    0    0    0    0    1    0    0
## 2    1    0    0    0    0    1    0    0    0    1    0    0    0    0
##   F1.3 F1.4 F1.5 F2.1 F2.2 F2.3 F2.4 F2.5 F3.1 F3.2 F3.3 F3.4 F3.5 F4.1
## 1    0    1    0    0    0    1    0    0    0    0    1    0    0    0
## 2    0    1    0    0    0    0    1    0    0    0    0    1    0    0
##   F4.2 F4.3 F4.4 F4.5 F5.1 F5.2 F5.3 F5.4 F5.5 F6.1 F6.2 F6.3 F6.4 F6.5
## 1    0    0    1    0    0    0    0    1    0    0    0    0    0    1
## 2    0    0    1    0    0    0    0    1    0    0    0    0    1    0
##   P1.1 P1.2 P1.3 P1.4 P1.5 P2.1 P2.2 P2.3 P2.4 P2.5 P3.1 P3.2 P3.3 P3.4
## 1    0    0    0    1    0    0    0    0    1    0    0    1    0    0
## 2    0    0    0    1    0    0    0    0    0    1    0    0    1    0
##   P3.5 P4.1 P4.2 P4.3 P4.4 P4.5 P5.1 P5.2 P5.3 P5.4 P5.5 P6.1 P6.2 P6.3
## 1    0    0    0    0    0    1    0    0    1    0    0    0    0    0
## 2    0    0    0    0    0    1    0    0    0    1    0    0    0    0
##   P6.4 P6.5 S1.1 S1.2 S1.3 S1.4 S1.5 S2.1 S2.2 S2.3 S2.4 S2.5 S3.1 S3.2
## 1    0    1    0    0    0    0    1    0    0    0    0    1    0    0
## 2    1    0    0    0    0    1    0    0    0    0    1    0    0    0
##   S3.3 S3.4 S3.5 S4.1 S4.2 S4.3 S4.4 S4.5 S5.1 S5.2 S5.3 S5.4 S5.5 S6.1
## 1    0    1    0    0    0    0    0    1    0    0    0    0    1    0
## 2    0    1    0    0    0    0    1    0    0    0    0    1    0    0
##   S6.2 S6.3 S6.4 S6.5
## 1    0    0    0    1
## 2    0    1    0    0

4.2.2 Correlation Plot

Burt Matrix : the matrix of all two-way cross-tabulations of the categorical variables. It is the inner product of design indicator matrix.

Note : MCA is applied to the Burt matrix

4.3 MCA Analysis

## [1] "It is estimated that your iterations will take 0.08 minutes."
## [1] "R is not in interactive() mode. Resample-based tests will be conducted. Please take note of the progress bar."
## ===========================================================================

4.3.1 Scree Plot

Note : MCA – overestimates variance explained – underestimates eigenvalues

The so-called ‘percentage of inertia problem’ can be improved by using adjusted inertias procedure or eigenvalue correction.

For MCA: Stick with the kaiser plot and look into first 3 dimensions

4.3.2 Factor Scores of the rows

4.3.3 Column Factor scores

4.3.3.1 Variable Contributions map

#### Important Contributions Only

Dim 1 and 2 both seem to be explained by Episodic and Future memory factor scores.

4.3.3.2 Factor scores with levels of important Variables

4.3.4 Contrubution Bar plots

Dim 1: Explained by Episodic Memory and Future Memory

Dim 2: Explained mostly by Episodic and Future

4.4 Conclusion

Dimension 1 and 2

Row: Non linear / parabolic curve from Autobiographical Memory to High
Autobiographical Memory.

Col: Questions related to Episodic memory and questions related to memory for future are significantly contributing (they lie diagonal to both the dimension- contribute almost equally).

There is a non linear relationship within the levels of Episodic memory and Future memory groups.

Dimension 2 and 3

Row: Separates Normal Memory group from the extreme memory groups i.e. both low and high memory groups

Col: Questions related to memory for Future, Spatial memory seem to be significantly contributing towards dimension 3.