# Chapter 3 Barycentric Discriminant Analysis

**Barycenter** refers to the mean of obseervaions from a group/category. BADA is a better version of dirscriminant analysis which attempts to assign observations to groups.

**Data table:** BADA is used to analyse two table dataset ( degisn group table and data table )

**Goal:**

Maximizing variance between the means of the Groups.

Predict group membership from Variables (Best prediction = Best separation between groups)

To combine the measurements(COMBINE THE PREDICTOR)to create new variables - Discriminant Variables- that best separate the categories.

**Note:** These discriminant variables are also used to assign the original observations or “new” observations to the a-priori defined categories.

**Key ideas**

- Separate the groups first and then perform a PCA.
- Dimensionality of the Space = No:of non zero eigenvalues = No:of groups - 1

**Interpretation**

```
1. Row Factor scores are the coordinates of the row observations.
They are interpreted by the distances between them, and heir distance
from the origin.
2. Column Loadings are interpreted by the angle between them, and their
distance from the origin.
**NOTE** These Loadings are different from PCA(Here we try to maximize
the ratio) Not maximize the variance F = (SSb/SSw)* (dfw/dfb)
3. The distance from the origin is important in both maps, because squared
distance from the mean is inertia.
Because of the Pythagorean Theorem, the total information contributed by
a data point (its squared distance to the origin) is also equal to the sum
of its squared factor scores.
```

## 3.1 Dataset: Survey of Autobiographical Memory

**Note: Dataset similar to PCA only addition is the Low AM group to the observations**

Participants were asked to rate the extent to which a particular item applied to their memory in general, using a **5-point Likert scale**

There are 153 obseravtions(**rows**) which represents the participants who answer to 26(**Columns**) questions that comprised of 8 Episodic memory based questions 6 Semantic memory questions, 6 Spatial memory based questions and 6 Prospective memory related questions.

The subjects include both men and women with their ages in the range of 18-84 years which are also mentioned as age and sex variable. A survey based measure of AM is also used to caatogorize the participants into two groups- **High memory, Normal Memory, Low Memory**

## 3.2 Looking at the Data Pattern

### 3.2.1 Heat Map

This heatmap shows the correlation of data with the design groups of observations

High memory group have higher responses for almost all types of questions. The low memory gorup however, perform better than normal memory group in some of the spatial questions.

## 3.3 BADA Analysis

```
#Run Bada-
resBADA <- tepBADA(d_BADA, DESIGN = d_active$memoryGroups,
graphs = FALSE, scale = FALSE, center = TRUE)
# Inferences ----
set.seed(70301) # we have a problem with the inference part
# it will be addressed soon. In the meantime we fix the seed
# for random
nIter = 50
resBADA.inf <- tepBADA.inference.battery(d_BADA,
DESIGN = d_active$memoryGroups,
#test.iters = nIter,
center = TRUE,
scale = FALSE,
graphs = FALSE)
```

```
## [1] "It is estimated that your iterations will take 0.14 minutes."
## [1] "R is not in interactive() mode. Resample-based tests will be conducted. Please take note of the progress bar."
## ===========================================================================
```

### 3.3.1 Scree Plot

we can see that Dimensionality of the Space = No:of non zero eigenvalues = No:of groups - 1
**Since our no:of groups is 3 (high, medium, low) the no:of eighenvalues are 2.**

### 3.3.2 BADA: Row Factor Maps

#### 3.3.2.1 Row Factor Scores with means.

Group means for High Medium and Low Autobiographical memory is plotted.

```
Imap <- createFactorMap( resBADA$TExPosition.Data$fii,
col.points = col4row,
col.labels = col4row ,
alpha.points = .4,
pch = 19,
display.labels = FALSE
)
label4Map <- createxyLabels.gen(1,2,
lambda = resBADA$TExPosition.Data$eigs,
tau = resBADA$TExPosition.Data$t)
group_means <- PTCA4CATA::getMeans(resBADA$TExPosition.Data$fii,
d_active$memoryGroups)
col4Means <- recode(rownames(group_means),
Low = 'orange',
High = 'darkred',
Norm = 'tomato2',
)
names(col4Means) <- rownames(group_means)
MapGroup <- createFactorMap(group_means,
# use the constraint from the main map
constraints = Imap$constraints,
col.points = col4Means,
col.labels = col4Means,
pch = 17,
cex = 4, # size of the dot (bigger)
text.cex = 4,
alpha.points = 1,
alpha.labels = 1)
# The map with observations and group means
a003.bada <- Imap$zeMap +
label4Map +
MapGroup$zeMap_dots +
MapGroup$zeMap_text
a003.bada
```

#### 3.3.2.2 Row factor scores with Tolerance Interval

Although the means are separate, the spread of the row factor as per the memory gorups overlap a lot.

#### 3.3.2.3 Row Factor scores with Confidence Intervals

The means of the three memory groups are significanlty different from each other.

### 3.3.3 BADA: Column loadings

All the memory types are positively correlated with each other (Observe that all the arrows are towards one side of the origin) Dimension 1 : Mostly Represents Episodic memory. Dimension 2 : Segregates Memory for Future from Spatial Memory

### 3.3.4 Contribution plots

Dimension 1 : Episodic memory contributes the most Dimension 2 : Segregates Memory for Future from Spatial Memory

### 3.3.5 Bootstrap Ratios bar plots

This plot shows that the contributions observed in the previous plots are reliable and significant.

```
BRj <- resBADA.inf$Inference.Data$boot.data$fj.boot.data$tests$boot.ratios
# BR1
d001.plotBRj.1 <- PrettyBarPlot2(
bootratio = BRj[,1],
threshold = 2,
ylim = NULL,
color4bar = gplots::col2hex(col4Var),
color4ns = "gray75",
plotnames = TRUE,
main = 'Bootstrap Ratios Variables. Dim 1.',
ylab = "Bootstrap Ratios")
d001.plotBRj.1
```

## 3.4 Fixed and Random effect

**Fixed effect:** predicting within the sample.
Check for the accuracy of the predition. High accuracy indicates that it is good at predicting within sample. **However, if the Random effect accuracy is very poor this would imply that there is overfitting of the data by this analysis.**

**Random effect:** predicting outside the sample. If the Random effect accuracy is high this would imply that the analysis is robust and could be generalized across datasets. **However keep in mind that in case of BADA we use the leaveone out/Jacknife method to check for stability which is prone to same sample bias.**

**Confusion Matrix**

BADA fails to perform well at segregating between the groups.

```
## .High .Norm .Low
## .High 0 0 0
## .Norm 72 72 72
## .Low 0 0 0
```

`## [1] 0.3333333`

```
## .High.actual .Norm.actual .Low.actual
## .High.predicted 0 0 0
## .Norm.predicted 72 72 72
## .Low.predicted 0 0 0
```

`## [1] 0.3333333`

## 3.5 Conclusion

Note: these interpretations are exactly similar to that of PCA

Component 1: Participanta who were grouped as high Autobiographical Memory have higher episodic memory.

Component 2: Tends to seperate semantic memory which from memory that requires imagination like future and spatial memory.

Note: Semantic memory is based on facts, meanings, concepts and knowledge about the external world that we have acquired and is independent of personal experience and of the spatial/temporal context in which it was acquired.

- Component 3: Explains how Future memory is different or separates from Spatial Memory.

**Note: BADA does not do a good job in separating the groups. Not very useful for SAM dataset.**