Chapter 9 DiSTATIS

Data Table: DiSTATIS method is used when we have a number of distance matrices collected on the same set of observations.

Goal:

• DISTATIS first evaluates the similarity between distance matrices.

• Like PCA it tries to find a new variable that maximizes the variance(most similar to all data tables)

• First, an Rv matrix is formed by taking the squared correlation* (cosine) between the Vectorized Tables. Then an eigen-decomposition of the square RV matrix is performed.

• The square roots of the eigenvector obtained from this decomposition is used to Weight the tables.

• From this analysis, a compromise matrix is computed using the weigts which represents the bestaggregate of the original matrices. The original distance matrices are then projected onto the compromise matrix.

Interpretations

``````1. Global Factor Scores

The coordinates of the observations on the components are called factor
scores and these can be used to plot maps of the observations in which the
observations are represented as points such that the distances in the map
best reflect the similarities between the observations.

2. Partial Factor Scores

The positions of the observations ‘as seen by’ each data set are called
partial factor scores and can be also represented as points in the
compromise map. The average of the factor scores of all the tables gives
back the factor score of the compromise.  ``````

Note: Means of the partial factor scores helps to understand the interpretaions better.

9.1 Dataset : Music Composers Dataset

Rows: [composer].[pianist].[ID] 36 different pieces of music categorized on the basis of composers and pianists.

Composers : Beethoven, Bach, Mozart

Pianist : Richter, Arrau, Pires, Baren

Design : color the music or compute means factor scores according to the composers or pianists

Columns: 37 participants who listen to these different pieces of music and they have a wide range of music experience.

Design: based on music experience

Setting up colours

``````#color for row design variables
col4Music<- Design_row\$Music
col4Music<-dplyr::recode(col4Music,
Bach = "skyblue",
Beet = "olivedrab1",
Mozart ="maroon2")

col4pianist<- Design_row\$Pianist
col4pianist<-dplyr::recode(col4pianist,
Arrau = "violet",
Richt = "violetred",
Baren = "slateblue1",
Pires = "midnightblue")

#color for column design variables
novice<- Sorting_Data[c(1:17)]
medium<- Sorting_Data[c(18:30)]
expert<- Sorting_Data[c(31:37)]

low <- 'gold'
med <- 'tomato'
high <- 'darkred'
col1 <- rep(low,length(novice))
col2 <- rep(med,length(medium))
col3 <- rep(high, length(expert))

col4col <- as.matrix(c(col1,col2,col3))

#color for group means
col4means <- gplots::col2hex(c( 'darkred','tomato','orange' ))

col4Music.Means <- gplots::col2hex(c( "skyblue", "olivedrab1","maroon2"))

col4pianist.Means <- as.matrix( gplots::col2hex(c(
"violet","violetred","slateblue1", "midnightblue")))``````

9.2 DiSTATIS Analysis

Distance cube : Each participant represents a matrix which forms a cube when stacked together.

``````# Create distance matrices
DistanceCube <- DistatisR::DistanceFromSort(Sorting_Data)

# Run the Plain DiSTATIS analysis
resDistatis <- DistatisR::distatis(DistanceCube)``````

Group Means in the Rv space

``````# Get the factors from the Cmat analysis
G <- resDistatis\$res4Cmat\$G
participant.mean_temp<- aggregate(G, by = list(t(Design_column)), mean)
participant.mean <- participant.mean_temp[,2:ncol(participant.mean_temp )]
rownames(participant.mean)<- participant.mean_temp[,1]

# Get the bootstrap estimates
BootCube <- PTCA4CATA::Boot4Mean(G, design = t(Design_column),
niter = 100,
suppressProgressBar = TRUE)``````

9.3 Looking at the data pattern

Glimpse of the sorting data.

The rows are named as : [composer].[pianist].[ID]

The last row depicts the years of musical experience

``kable(tail(Raw_Data))``
Music Pianist bc005 bc010 bc012 bc018 bc020 bc027 bc028 bc037 bc002 bc008 bc001 bc006 bc007 bc009 bc011 bc026 bc030 bc022 bc024 bc029 bc003 bc014 bc016 bc021 bc025 bc031 bc004 bc013 bc032 bc035 bc015 bc036 bc034 bc017 bc033 bc023 bc019
Moza.Pires.32 Mozart Pires 1 2 3 3 3 3 3 1 1.0 2.0 2 3 2 3 1 2 1 2 1 1 2 3 2 3 3 1 3 2 1 3 2 3 1 3 1 2 2
Moza.Pires.33 Mozart Pires 1 3 2 1 2 2 1 3 1.0 2.0 2 2 3 3 1 3 2 3 3 3 1 1 2 2 3 2 3 2 1 3 3 3 1 2 1 3 2
Moza.Richt.34 Mozart Richt 2 3 2 2 1 1 3 1 1.0 2.0 3 2 2 3 2 1 2 3 2 3 1 2 3 2 2 2 1 1 3 1 3 1 2 2 2 1 1
Moza.Richt.35 Mozart Richt 1 2 1 2 2 3 2 1 1.0 3.0 3 3 3 2 3 2 3 2 3 2 2 3 1 1 1 3 3 3 3 1 2 2 1 3 3 3 3
Moza.Richt.36 Mozart Richt 1 3 1 2 1 2 3 3 3.0 1.0 2 2 2 1 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2
Musical Experience 0 0 0 0 0 0 0 0 0.5 0.5 1 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 4 7 7 7 8 9 10 11 11 12 15

9.3.1 Heat Maps

1. Raw Data

This heatmap shows how mixed the opinions are of the participants. However, one can notice that the experts plot have darker shades of colors depicting that they have stronger opinions ( stonger similarity or stonger dis-similarity )

NOTE the participants are coloured from yellow to orange to red based on their years of music experience from novice to medium to expert.

``````data.corr <-cor(Sorting_Data)
corrplot(data.corr , method = "color",tl.col=col4col  ,tl.cex = .3, cl.pos='b',
col = colorRampPalette(c("darkred", "white","midnightblue"))(30)
)``````

2. Rv Matrix

RV is a “cosine” between “vectorized” matrices(like squared correlation or R^2) squared cosine value ranges between: 0 ≤RV ≤1

Here we have one table per participant who group the music pieces.

The tables reflect a similar interpretation of what we observed while plotting the previous heatmap.

The table for each participant shows a very slight or no correlation between them, impling the wide variety of differences in the way they have grouped the music pieces.

NOTE the participants are coloured from yellow to orange to red based on their years of music experience from novice to medium to expert.

``````rvmatrix <- resDistatis\$res4Cmat\$C

corrplot(rvmatrix, method = "color",tl.col=col4col, tl.cex = .3, cl.pos='b',
col = colorRampPalette(c("darkred", "white","midnightblue"))(20)
)``````

9.3.2 Scree Plots

Rv scree plot - the more the first dimension explains implies the amount of homogenity amongst participants.

``````#Scree Plot for Rv matrix
PlotScree(ev = resDistatis\$res4Cmat\$eigValues,plotKaiser = TRUE,
title = "RV-map: Explained Variance per Dimension")``````

Scree plot for the Compromise:

``````scree.comp <- PlotScree(ev = resDistatis\$res4Splus\$eigValues,plotKaiser = TRUE,
title = "Compromise: Explained Variance per Dimension")``````

``scree.comp``
``## [1] 0.0 0.1 0.2 0.3 0.4 0.5``

9.3.3 Column Factor Map :(Participants)

The participants are grouped according to the years of music experience they posses.

The novice group, medium and expert group means tend to overlap without a significant separation.

This also is along the lines of what the heat map had shown us.

``````##        Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension 5
## Expert   0.4190777  0.04088622 -0.04650200  0.05515275  0.06541816
## Medium   0.3900973 -0.06946785  0.02485049 -0.04557256  0.02545989
## Novice   0.3489853 -0.02856409  0.02489506 -0.02559719 -0.01436683
##        Dimension 6 Dimension 7  Dimension 8 Dimension 9 Dimension 10
## Expert -0.02508290  0.01042670  0.007858174  0.08011813   0.02611465
## Medium -0.01749559  0.01842934 -0.004916735  0.02611966  -0.01301884
## Novice  0.06767128 -0.05818115  0.013828736 -0.03787142   0.04582106
##        Dimension 11 Dimension 12 Dimension 13 Dimension 14 Dimension 15
## Expert   0.09812612 -0.025210577 -0.032237317   0.03739449  -0.06689083
## Medium  -0.08100918  0.003787312  0.038567449  -0.00468344  -0.03074542
## Novice   0.03860886  0.026450333 -0.001809989  -0.01704577   0.05026087
##        Dimension 16 Dimension 17 Dimension 18 Dimension 19 Dimension 20
## Expert   0.07891217   0.06075875  0.002710560 -0.075613485   0.00938272
## Medium  -0.06607214  -0.05473816 -0.007747089  0.025220830   0.03876686
## Novice   0.02836442   0.01192971  0.002520974  0.006330022  -0.02991712
##        Dimension 21  Dimension 22 Dimension 23 Dimension 24 Dimension 25
## Expert -0.064570646 -4.787284e-02 -0.056935701 -0.076498470 -0.003501672
## Medium -0.009405621  7.990935e-05  0.030558662  0.042015250 -0.003458521
## Novice  0.040048541  2.244671e-02  0.008239501  0.002462888  0.011369348
##        Dimension 26 Dimension 27 Dimension 28 Dimension 29 Dimension 30
## Expert   0.05774858 -0.043357210  0.058505158  0.019615443  0.033652259
## Medium   0.04507269  0.026789622  0.007109147 -0.011754702 -0.017085191
## Novice  -0.05724637  0.005517884 -0.021977967  0.003183785  0.001681241
##        Dimension 31 Dimension 32 Dimension 33 Dimension 34 Dimension 35
## Expert -0.074705031 -0.035699149  -0.01632340 -0.059112280  -0.01204539
## Medium  0.008044996  0.012565320  -0.01459385  0.009489098   0.07374715
## Novice  0.027411983 -0.003839095   0.01800183  0.023198766  -0.04569065
##        Dimension 36 Dimension 37
## Expert -0.015190944 -0.003644783
## Medium -0.003905091 -0.016500191
## Novice  0.009638660  0.016979939``````

9.3.4 Partial Factor scores

Compare the partial factor scores between Novice, Medium, Expert

``````F_j <- resDistatis\$res4Splus\$PartialF
alpha_j <- resDistatis\$res4Cmat\$alpha

Group_Participant<- (Design_column)
code4Groups <- unique(Group_Participant)
nK <- length(code4Groups)

# initialize F_K and alpha_k
F_k <- array(0, dim = c(dim(F_j)[[1]], dim(F_j)[[2]],nK))
dimnames(F_k) <- list(dimnames(F_j)[[1]],
dimnames(F_j)[[2]], code4Groups)
alpha_k <- rep(0, nK)
names(alpha_k) <- code4Groups
Fa_j <- F_j
# A horrible loop
for (j in 1:dim(F_j)[[3]]){ Fa_j[,,j] <- F_j[,,j] * alpha_j[j] }
# Another horrible loop
for (k in 1:nK){
lindex <- Group_Participant == code4Groups[k]
alpha_k[k] <- sum(alpha_j[lindex])
F_k[,,k] <- (1/alpha_k[k])*apply(Fa_j[,,lindex],c(1,2),sum)
}

pFi <- F_k``````

9.3.5 Row Factor Map based on design : Music Pieces

The group means of Bach, Mozart and Beethoven music groups are slighly separated. However their confidence intervals tend to overlap.

9.3.6 Row Factor Map based on design Pianists

The group means of Baren is significantly different from Arrau Richt and Pires whose group means tend to overlap each other.

9.3.7 Partial Factor scores

The methods of grouping of experts and the Novice appear to be much diffferent than the rest of the participants because their distance from the compromise factor score is larger than the medium group.

This is much more clear while observing the means of the partial factor scores.

``````col4means <- gplots::col2hex(c('orange' ,'tomato','darkred' ))

pFi <- F_k

map4PFS <- createPartialFactorScoresMap(
factorScores = Fi,
partialFactorScores = pFi,

axis1 = 1, axis2 = 2,
colors4Items = as.vector(col4Music),
colors4Blocks = as.vector(col4means),
#colors4Blocks = c("lightblue", "skyblue","midnightblue"),

names4Partial = dimnames(pFi)[[3]], #
font.labels = 'bold',
size.labels = 2)

plot.pFi1 <-  Fi.plot\$zeMap +map4PFS\$mapColByItems
plot.pFi1 ``````

``````plot.pFi2 <- Fi.plot\$zeMap + labels4S + map4PFS\$mapColByBlocks

plot.pFi2``````

Partial factor scores of Novice Group

It can be observed that the partial factor scores are closer to the origin indicating that they are not very well segregated and grouped as per the music pieces.

``````nov<- pFi[ ,c(1:2),1]
p.Novice <-PTCA4CATA::createFactorMap(nov,

col.points = col4Music,
col.labels = col4Music,
axis1 = 1,
axis2 = 2,
title = 'Patial factor scores of Novice',
alpha.points = 0.8,
display.labels = FALSE
)

plot.Novice <- Fi.plot\$zeMap + labels4S + p.Novice\$zeMap_dots
plot.Novice``````

Partial factor scores of the Medium Group

The Medium partial factor scores are much more spead out than the Novice Group

``````med <- pFi[ ,c(1:2),2]
p.medium <-PTCA4CATA::createFactorMap(med,

col.points = col4Music,
col.labels = col4Music,
axis1 = 1,
axis2 = 2,
title = 'Patial factor scores of Novice',
alpha.points = 0.8,
display.labels = FALSE
)

plot.medium <- Fi.plot\$zeMap + labels4S + p.medium\$zeMap_dots

plot.medium``````

Partial Factor Scores of the Experts

These Factor scores are even more spread out. Novice < Medium < Expert

``````exp <- pFi[ ,c(1:2),3]
p.Exp <-PTCA4CATA::createFactorMap(exp,

col.points = col4Music,
col.labels = col4Music,
axis1 = 1,
axis2 = 2,
title = 'Patial factor scores of Novice',
alpha.points = 0.8,
display.labels = FALSE
)

plot.exp <- Fi.plot\$zeMap + labels4S + p.Exp\$zeMap_dots

plot.exp``````

9.3.8 Partial factor scores with Means (Music pieces)

Experts are different from Novice by a large amount when trying to group Mozart and Bach music pieces together than when grouping Beethoven pieces together.

While goruping Beethoven music together, experts and medium level participants do almost an equal job.

``````meanfk <-
apply(F_k, c(2,3), FUN = function(x){
aggregate(x, by = list(Design_row\$Music), mean)\$x
})

mean.plot <- createFactorMap(fi.mean,
constraints = minmaxHelper4Partial(fi.mean, meanfk, axis1 = 1 ,axis2 = 2) ,
alpha.points = 1,
display.labels = TRUE,
col.points = col4Music.Means,
col.labels = col4Music.Means,
pch = 17,
cex = 3,
text.cex = 4
)

Fi.meanonly.plot<- mean.plot\$zeMap_background+mean.plot\$zeMap_dots + mean.plot\$zeMap_text+ labels4S

Fi.meanonly.plot``````

``````pf.means <- createPartialFactorScoresMap(
factorScores = fi.mean,
partialFactorScores = meanfk,
axis1 = 1, axis2 = 2,
colors4Items = as.vector(col4Music.Means),
colors4Blocks = as.vector(col4means),
names4Partial = dimnames(meanfk)[[3]], #
font.labels = 'bold',
size.labels = 4,
)

plot.pFi.mean <- Fi.meanonly.plot + labels4S+ pf.means\$mapColByItems

plot.pFi.mean2 <- Fi.meanonly.plot + labels4S+ pf.means\$mapColByBlocks
plot.pFi.mean2``````

9.3.9 Partial factor scores with Means(Pianist)

Experts are different from Novice by a large amount when trying to group Arrau and Pires pieces together.

Experts Novice and Medium groups categorize the richter pianists quite similarly.

Overall the Medium Group is not significantly different from the compromised mean factor scores in grouping any of these pianinsts.

9.4 Conclusion

Music Pieces:

Experts are different from movice by a large amount when trying to group Mozart and Bach music pieces together than when grouping Beethoven pieces together.

While goruping Beethiven music together, experts and medium level participants do almost an equal job.

Pianists:

Experts are different from Novice by a large amount when trying to group Arrau and Pires pieces together.

Experts Novice and Medium groups categorize the richter pianists quite similarly.

Overall the Medium Group is not significantly different from the compromised mean factor scores in grouping any of these pianinsts.