Chapter 3 Barycentric Discriminant Analysis

Barycenter refers to the mean of obseervaions from a group/category. BADA is a better version of dirscriminant analysis which attempts to assign observations to groups.

Data table: BADA is used to analyse two table dataset ( degisn group table and data table )

Goal:

  • Maximizing variance between the means of the Groups.

  • Predict group membership from Variables (Best prediction = Best separation between groups)

  • To combine the measurements(COMBINE THE PREDICTOR)to create new variables - Discriminant Variables- that best separate the categories.

Note: These discriminant variables are also used to assign the original observations or “new” observations to the a-priori defined categories.

Key ideas

  1. Separate the groups first and then perform a PCA.
  2. Dimensionality of the Space = No:of non zero eigenvalues = No:of groups - 1

Interpretation

1.  Row Factor scores are the coordinates of the row observations. 
    They are interpreted by the distances between them, and heir distance
    from the origin. 

2. Column Loadings are interpreted by the angle between them, and their
   distance from the origin. 
   **NOTE** These Loadings are different from PCA(Here we try to maximize
   the ratio) Not maximize the variance F = (SSb/SSw)* (dfw/dfb) 

3. The distance from the origin is important in both maps, because squared
   distance from the mean is inertia.
   Because of the Pythagorean Theorem, the total information contributed by 
   a data point (its squared distance to the origin) is also equal to the sum
   of its squared factor scores.

3.1 Dataset: Survey of Autobiographical Memory

Note: Dataset similar to PCA only addition is the Low AM group to the observations

Participants were asked to rate the extent to which a particular item applied to their memory in general, using a 5-point Likert scale

There are 153 obseravtions(rows) which represents the participants who answer to 26(Columns) questions that comprised of 8 Episodic memory based questions 6 Semantic memory questions, 6 Spatial memory based questions and 6 Prospective memory related questions.

The subjects include both men and women with their ages in the range of 18-84 years which are also mentioned as age and sex variable. A survey based measure of AM is also used to caatogorize the participants into two groups- High memory, Normal Memory, Low Memory

3.2 Looking at the Data Pattern

3.2.1 Heat Map

This heatmap shows the correlation of data with the design groups of observations

High memory group have higher responses for almost all types of questions. The low memory gorup however, perform better than normal memory group in some of the spatial questions.

3.3 BADA Analysis

## [1] "It is estimated that your iterations will take 0.14 minutes."
## [1] "R is not in interactive() mode. Resample-based tests will be conducted. Please take note of the progress bar."
## ===========================================================================

3.3.1 Scree Plot

we can see that Dimensionality of the Space = No:of non zero eigenvalues = No:of groups - 1 Since our no:of groups is 3 (high, medium, low) the no:of eighenvalues are 2.

3.3.3 BADA: Column loadings

All the memory types are positively correlated with each other (Observe that all the arrows are towards one side of the origin) Dimension 1 : Mostly Represents Episodic memory. Dimension 2 : Segregates Memory for Future from Spatial Memory

3.3.4 Contribution plots

Dimension 1 : Episodic memory contributes the most Dimension 2 : Segregates Memory for Future from Spatial Memory

3.4 Fixed and Random effect

Fixed effect: predicting within the sample. Check for the accuracy of the predition. High accuracy indicates that it is good at predicting within sample. However, if the Random effect accuracy is very poor this would imply that there is overfitting of the data by this analysis.

Random effect: predicting outside the sample. If the Random effect accuracy is high this would imply that the analysis is robust and could be generalized across datasets. However keep in mind that in case of BADA we use the leaveone out/Jacknife method to check for stability which is prone to same sample bias.

Confusion Matrix

BADA fails to perform well at segregating between the groups.

##       .High .Norm .Low
## .High     0     0    0
## .Norm    72    72   72
## .Low      0     0    0
## [1] 0.3333333
##                 .High.actual .Norm.actual .Low.actual
## .High.predicted            0            0           0
## .Norm.predicted           72           72          72
## .Low.predicted             0            0           0
## [1] 0.3333333

3.5 Conclusion

Note: these interpretations are exactly similar to that of PCA

  • Component 1: Participanta who were grouped as high Autobiographical Memory have higher episodic memory.

  • Component 2: Tends to seperate semantic memory which from memory that requires imagination like future and spatial memory.

Note: Semantic memory is based on facts, meanings, concepts and knowledge about the external world that we have acquired and is independent of personal experience and of the spatial/temporal context in which it was acquired.

  • Component 3: Explains how Future memory is different or separates from Spatial Memory.

Note: BADA does not do a good job in separating the groups. Not very useful for SAM dataset.