Chapter 2 Barycentric Discriminant Analysis

Used to assign (classify) participants to a-priori assigned groups using quantitative data. It first performs PCA on the Barycenters and then projects the observations and variables back into the same space. The Barycenters are the means of groups’ observations.

It is similar to a perceptron in terms of prediction effectiveness. It handles the “p>>n” problem where the data has many more variables than observations.

2.1 Data set: PHQ

The Patient Health Questionnaire is a survey that is a preliminary measurement for depression severity.

There are 9 questions (columns) measured on 225 participants (rows).

The descriptors of the pariticpants are memory group, sex, and age. For this analysis I will be using memory group that is either high, normal, or low memory.

2.2 Analysis

resBADA <- tepBADA(PHQ, DESIGN = GroupingVaribles$memoryGroups, scale = FALSE,
                   graphs = FALSE)

# Inferences ----
#set.seed(70301) # we have a problem with the inference part
  # it will be addressed soon. In the meantime we fix the seed
  # for random 
nIter = 50
resBADA.inf <- tepBADA.inference.battery(PHQ, scale = FALSE,
                  DESIGN = GroupingVaribles$memoryGroups,
                  test.iters = nIter,
                  graphs = FALSE)

## [1] "It is estimated that your iterations will take 0.06 minutes."
## [1] "R is not in interactive() mode. Resample-based tests will be conducted. Please take note of the progress bar."
## ===========================================================================

2.2.1 Scree Plot

2.2.2 Permutation Tests for Significant Eigenvalues

The first dimension is reliable and above the Kaiser line.

2.2.3 Barycenters with Observations

The correlation plot of the design variable shows that high memory is completely negatively correlated with low and normal memory. Low and normal memory are completely positively correlated.

The bootstraps of the barycenters shows that low (-) and high (+) memory are reliably separated, but the hulls overlap meaing that there is a lot variance in all three groups (makes classification harder).

2.2.4 Column Loadings

The circle of correlation shows that the variance of speed and suicide is hardly captured in the first two dimensions. However, energy and sleep are almost completely accounted for in the first two dimensions.

2.2.5 Contributions

2.2.6 Bootstrap Ratios

The bootstraps show that the second dimension is not reliable. However, the significant contributions in dimesion 1 are reliable.

2.3 Accuracy

#Fixed Model
row.names(resBADA.inf$Inference.Data$loo.data$fixed.confuse) <- c("High", "Norm", "Low")
colnames(resBADA.inf$Inference.Data$loo.data$fixed.confuse) <- c("High", "Norm", "Low")
resBADA.inf$Inference.Data$loo.data$fixed.confuse

##      High Norm Low
## High   44   34  27
## Norm   15   15  14
## Low    13   23  31

resBADA.inf$Inference.Data$loo.data$fixed.acc

## [1] 0.4166667

#Random Model
row.names(resBADA.inf$Inference.Data$loo.data$loo.confuse) <- c("High", "Norm", "Low")
colnames(resBADA.inf$Inference.Data$loo.data$loo.confuse) <- c("High", "Norm", "Low")
resBADA.inf$Inference.Data$loo.data$loo.confuse

##      High Norm Low
## High   44   34  27
## Norm   15   14  18
## Low    13   24  27

resBADA.inf$Inference.Data$loo.data$loo.acc

## [1] 0.3935185

The classification accuracy between the fixed effect model and the random effect model seems stable meaning that this model could be used for generalization considering the accuracy is above chance.

2.4 Summary

When we interpret the factor scores and loadings together, the BADA revealed:

Component 1

Rows: Low (-) vs high (+) memory

Columns interpretation: Low memory scores higher on pleasure, energy, failure, and focus.

Component 2

Rows: Confidence intervals overlap I.e. not reliable

Columns: bootstraps show that nothing contributing in this dimension is reliable