22.1 MANOVA

Multivariate Analysis of Variance

One-way MANOVA

Compare treatment means for h different populations

Population 1: \(\mathbf{y}_{11}, \mathbf{y}_{12}, \dots, \mathbf{y}_{1n_1} \sim idd N_p (\mathbf{\mu}_1, \mathbf{\Sigma})\)

\(\vdots\)

Population h: \(\mathbf{y}_{h1}, \mathbf{y}_{h2}, \dots, \mathbf{y}_{hn_h} \sim idd N_p (\mathbf{\mu}_h, \mathbf{\Sigma})\)

Assumptions

  1. Independent random samples from \(h\) different populations
  2. Common covariance matrices
  3. Each population is multivariate normal

Calculate the summary statistics \(\mathbf{\bar{y}}_i, \mathbf{S}\) and the pooled estimate of the covariance matrix \(\mathbf{S}\)

Similar to the univariate one-way ANVOA, we can use the effects model formulation \(\mathbf{\mu}_i = \mathbf{\mu} + \mathbf{\tau}_i\), where

  • \(\mathbf{\mu}_i\) is the population mean for population i

  • \(\mathbf{\mu}\) is the overall mean effect

  • \(\mathbf{\tau}_i\) is the treatment effect of the i-th treatment.

For the one-way model: \(\mathbf{y}_{ij} = \mu + \tau_i + \epsilon_{ij}\) for \(i = 1,..,h; j = 1,..., n_i\) and \(\epsilon_{ij} \sim N_p(\mathbf{0, \Sigma})\)

However, the above model is over-parameterized (i.e., infinite number of ways to define \(\mathbf{\mu}\) and the \(\mathbf{\tau}_i\)’s such that they add up to \(\mu_i\). Thus we can constrain by having

\[ \sum_{i=1}^h n_i \tau_i = 0 \]

or

\[ \mathbf{\tau}_h = 0 \]

The observational equivalent of the effects model is

\[ \begin{aligned} \mathbf{y}_{ij} &= \mathbf{\bar{y}} + (\mathbf{\bar{y}}_i - \mathbf{\bar{y}}) + (\mathbf{y}_{ij} - \mathbf{\bar{y}}_i) \\ &= \text{overall sample mean} + \text{treatement effect} + \text{residual} \text{ (under univariate ANOVA)} \end{aligned} \]

After manipulation

\[ \sum_{i = 1}^h \sum_{j = 1}^{n_i} (\mathbf{\bar{y}}_{ij} - \mathbf{\bar{y}})(\mathbf{\bar{y}}_{ij} - \mathbf{\bar{y}})' = \sum_{i = 1}^h n_i (\mathbf{\bar{y}}_i - \mathbf{\bar{y}})(\mathbf{\bar{y}}_i - \mathbf{\bar{y}})' + \sum_{i=1}^h \sum_{j = 1}^{n_i} (\mathbf{\bar{y}}_{ij} - \mathbf{\bar{y}})(\mathbf{\bar{y}}_{ij} - \mathbf{\bar{y}}_i)' \]

LHS = Total corrected sums of squares and cross products (SSCP) matrix

RHS =

  • 1st term = Treatment (or between subjects) sum of squares and cross product matrix (denoted H;B)

  • 2nd term = residual (or within subject) SSCP matrix denoted (E;W)

Note:

\[ \mathbf{E} = (n_1 - 1)\mathbf{S}_1 + ... + (n_h -1) \mathbf{S}_h = (n-h) \mathbf{S} \]

MANOVA table

MONOVA table
Source SSCP df
Treatment \(\mathbf{H}\) \(h -1\)
Residual (error) \(\mathbf{E}\) \(\sum_{i= 1}^h n_i - h\)
Total Corrected \(\mathbf{H + E}\) \(\sum_{i=1}^h n_i -1\)

\[ H_0: \tau_1 = \tau_2 = \dots = \tau_h = \mathbf{0} \]

We consider the relative “sizes” of \(\mathbf{E}\) and \(\mathbf{H+E}\)

Wilk’s Lambda

Define Wilk’s Lambda

\[ \Lambda^* = \frac{|\mathbf{E}|}{|\mathbf{H+E}|} \]

Properties:

  1. Wilk’s Lambda is equivalent to the F-statistic in the univariate case

  2. The exact distribution of \(\Lambda^*\) can be determined for especial cases.

  3. For large sample sizes, reject \(H_0\) if

\[ -(\sum_{i=1}^h n_i - 1 - \frac{p+h}{2}) \log(\Lambda^*) > \chi^2_{(1-\alpha, p(h-1))} \]

22.1.1 Testing General Hypotheses

  • \(h\) different treatments

  • with the i-th treatment

  • applied to \(n_i\) subjects that

  • are observed for \(p\) repeated measures.

Consider this a \(p\) dimensional obs on a random sample from each of \(h\) different treatment populations.

\[ \mathbf{y}_{ij} = \mathbf{\mu} + \mathbf{\tau}_i + \mathbf{\epsilon}_{ij} \]

for \(i = 1,..,h\) and \(j = 1,..,n_i\)

Equivalently,

\[ \mathbf{Y} = \mathbf{XB} + \mathbf{\epsilon} \]

where \(n = \sum_{i = 1}^h n_i\) and with restriction \(\mathbf{\tau}_h = 0\)

\[ \mathbf{Y}_{(n \times p)} = \left[ \begin{array} {c} \mathbf{y}_{11}' \\ \vdots \\ \mathbf{y}_{1n_1}' \\ \vdots \\ \mathbf{y}_{hn_h}' \end{array} \right], \mathbf{B}_{(h \times p)} = \left[ \begin{array} {c} \mathbf{\mu}' \\ \mathbf{\tau}_1' \\ \vdots \\ \mathbf{\tau}_{h-1}' \end{array} \right], \mathbf{\epsilon}_{(n \times p)} = \left[ \begin{array} {c} \epsilon_{11}' \\ \vdots \\ \epsilon_{1n_1}' \\ \vdots \\ \epsilon_{hn_h}' \end{array} \right] \]

\[ \mathbf{X}_{(n \times h)} = \left[ \begin{array} {ccccc} 1 & 1 & 0 & \ldots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & 1 & 0 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ 1 & 0 & 0 & \ldots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & 0 & 0 & \ldots & 0 \end{array} \right] \]

Estimation

\[ \mathbf{\hat{B}} = (\mathbf{X'X})^{-1} \mathbf{X'Y} \]

Rows of \(\mathbf{Y}\) are independent (i.e., \(var(\mathbf{Y}) = \mathbf{I}_n \otimes \mathbf{\Sigma}\) , an \(np \times np\) matrix, where \(\otimes\) is the Kronecker product).

\[ \begin{aligned} &H_0: \mathbf{LBM} = 0 \\ &H_a: \mathbf{LBM} \neq 0 \end{aligned} \]

where

  • \(\mathbf{L}\) is a \(g \times h\) matrix of full row rank (\(g \le h\)) = comparisons across groups

  • \(\mathbf{M}\) is a \(p \times u\) matrix of full column rank (\(u \le p\)) = comparisons across traits

The general treatment corrected sums of squares and cross product is

\[ \mathbf{H} = \mathbf{M'Y'X(X'X)^{-1}L'[L(X'X)^{-1}L']^{-1}L(X'X)^{-1}X'YM} \]

or for the null hypothesis \(H_0: \mathbf{LBM} = \mathbf{D}\)

\[ \mathbf{H} = (\mathbf{\hat{LBM}} - \mathbf{D})'[\mathbf{X(X'X)^{-1}L}]^{-1}(\mathbf{\hat{LBM}} - \mathbf{D}) \]

The general matrix of residual sums of squares and cross product

\[ \mathbf{E} = \mathbf{M'Y'[I-X(X'X)^{-1}X']YM} = \mathbf{M'[Y'Y - \hat{B}'(X'X)^{-1}\hat{B}]M} \]

We can compute the following statistic eigenvalues of \(\mathbf{HE}^{-1}\)

  • Wilk’s Criterion: \(\Lambda^* = \frac{|\mathbf{E}|}{|\mathbf{H} + \mathbf{E}|}\) . The df depend on the rank of \(\mathbf{L}, \mathbf{M}, \mathbf{X}\)

  • Lawley-Hotelling Trace: \(U = tr(\mathbf{HE}^{-1})\)

  • Pillai Trace: \(V = tr(\mathbf{H}(\mathbf{H}+ \mathbf{E}^{-1})\)

  • Roy’s Maximum Root: largest eigenvalue of \(\mathbf{HE}^{-1}\)

If \(H_0\) is true and n is large, \(-(n-1- \frac{p+h}{2})\ln \Lambda^* \sim \chi^2_{p(h-1)}\). Some special values of p and h can give exact F-dist under \(H_0\)

# One-way MANOVA

library(car)
library(emmeans)
library(profileR)
library(tidyverse)

## Read in the data
gpagmat <- read.table("images/gpagmat.dat")

## Change the variable names
names(gpagmat) <- c("y1", "y2", "admit")

## Check the structure
str(gpagmat)
#> 'data.frame':    85 obs. of  3 variables:
#>  $ y1   : num  2.96 3.14 3.22 3.29 3.69 3.46 3.03 3.19 3.63 3.59 ...
#>  $ y2   : int  596 473 482 527 505 693 626 663 447 588 ...
#>  $ admit: int  1 1 1 1 1 1 1 1 1 1 ...


## Plot the data
gg <- ggplot(gpagmat, aes(x = y1, y = y2)) +
    geom_text(aes(label = admit, col = as.character(admit))) +
    scale_color_discrete(name = "Admission",
                         labels = c("Admit", "Do not admit", "Borderline")) +
    scale_x_continuous(name = "GPA") +
    scale_y_continuous(name = "GMAT")

## Fit one-way MANOVA
oneway_fit <- manova(cbind(y1, y2) ~ admit, data = gpagmat)
summary(oneway_fit, test = "Wilks")
#>           Df  Wilks approx F num Df den Df    Pr(>F)    
#> admit      1 0.6126   25.927      2     82 1.881e-09 ***
#> Residuals 83                                            
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

reject the null of equal multivariate mean vectors between the three admmission groups

# Repeated Measures MANOVA


## Create data frame
stress <- data.frame(
    subject = 1:8,
    begin = c(3, 2, 5, 6, 1, 5, 1, 5),
    middle = c(3, 4, 3, 7, 4, 7, 1, 2),
    final = c(6, 7, 4, 7, 6, 7, 3, 5)
)
  • If independent = time with 3 levels -> univariate ANOVA (require sphericity assumption (i.e., the variances for all differences are equal))
  • If each level of independent time as a separate variable -> MANOVA (does not require sphericity assumption)
## MANOVA
stress_mod <- lm(cbind(begin, middle, final) ~ 1, data = stress)
idata <-
    data.frame(time = factor(
        c("begin", "middle", "final"),
        levels = c("begin", "middle", "final")
    ))
repeat_fit <-
    Anova(
        stress_mod,
        idata = idata,
        idesign = ~ time,
        icontrasts = "contr.poly"
    )
summary(repeat_fit) 
#> 
#> Type III Repeated Measures MANOVA Tests:
#> 
#> ------------------------------------------
#>  
#> Term: (Intercept) 
#> 
#>  Response transformation matrix:
#>        (Intercept)
#> begin            1
#> middle           1
#> final            1
#> 
#> Sum of squares and products for the hypothesis:
#>             (Intercept)
#> (Intercept)        1352
#> 
#> Multivariate Tests: (Intercept)
#>                  Df test stat approx F num Df den Df     Pr(>F)    
#> Pillai            1  0.896552 60.66667      1      7 0.00010808 ***
#> Wilks             1  0.103448 60.66667      1      7 0.00010808 ***
#> Hotelling-Lawley  1  8.666667 60.66667      1      7 0.00010808 ***
#> Roy               1  8.666667 60.66667      1      7 0.00010808 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------
#>  
#> Term: time 
#> 
#>  Response transformation matrix:
#>               time.L     time.Q
#> begin  -7.071068e-01  0.4082483
#> middle -7.850462e-17 -0.8164966
#> final   7.071068e-01  0.4082483
#> 
#> Sum of squares and products for the hypothesis:
#>           time.L   time.Q
#> time.L 18.062500 6.747781
#> time.Q  6.747781 2.520833
#> 
#> Multivariate Tests: time
#>                  Df test stat approx F num Df den Df   Pr(>F)  
#> Pillai            1 0.7080717 7.276498      2      6 0.024879 *
#> Wilks             1 0.2919283 7.276498      2      6 0.024879 *
#> Hotelling-Lawley  1 2.4254992 7.276498      2      6 0.024879 *
#> Roy               1 2.4254992 7.276498      2      6 0.024879 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#> 
#>             Sum Sq num Df Error SS den Df F value    Pr(>F)    
#> (Intercept) 450.67      1    52.00      7 60.6667 0.0001081 ***
#> time         20.58      2    24.75     14  5.8215 0.0144578 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Mauchly Tests for Sphericity
#> 
#>      Test statistic p-value
#> time         0.7085 0.35565
#> 
#> 
#> Greenhouse-Geisser and Huynh-Feldt Corrections
#>  for Departure from Sphericity
#> 
#>       GG eps Pr(>F[GG])  
#> time 0.77429    0.02439 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>         HF eps Pr(>F[HF])
#> time 0.9528433 0.01611634

can’t reject the null hypothesis of sphericity, hence univariate ANOVA is also appropriate.We also see linear significant time effect, but no quadratic time effect

## Polynomial contrasts
# What is the reference for the marginal means?
ref_grid(stress_mod, mult.name = "time")
#> 'emmGrid' object with variables:
#>     1 = 1
#>     time = multivariate response levels: begin, middle, final

# marginal means for the levels of time
contr_means <- emmeans(stress_mod, ~ time, mult.name = "time")
contrast(contr_means, method = "poly")
#>  contrast  estimate    SE df t.ratio p.value
#>  linear        2.12 0.766  7   2.773  0.0276
#>  quadratic     1.38 0.944  7   1.457  0.1885
# MANOVA


## Read in Data
heart <- read.table("images/heart.dat")
names(heart) <- c("drug", "y1", "y2", "y3", "y4")
## Create a subject ID nested within drug
heart <- heart %>%
    group_by(drug) %>%
    mutate(subject = row_number()) %>%
    ungroup()
str(heart)
#> tibble [24 × 6] (S3: tbl_df/tbl/data.frame)
#>  $ drug   : chr [1:24] "ax23" "ax23" "ax23" "ax23" ...
#>  $ y1     : int [1:24] 72 78 71 72 66 74 62 69 85 82 ...
#>  $ y2     : int [1:24] 86 83 82 83 79 83 73 75 86 86 ...
#>  $ y3     : int [1:24] 81 88 81 83 77 84 78 76 83 80 ...
#>  $ y4     : int [1:24] 77 82 75 69 66 77 70 70 80 84 ...
#>  $ subject: int [1:24] 1 2 3 4 5 6 7 8 1 2 ...

## Create means summary for profile plot,
# pivot longer for plotting with ggplot
heart_means <- heart %>%
    group_by(drug) %>%
    summarize_at(vars(starts_with("y")), mean) %>%
    ungroup() %>%
    pivot_longer(-drug, names_to = "time", values_to = "mean") %>%
    mutate(time = as.numeric(as.factor(time)))
gg_profile <- ggplot(heart_means, aes(x = time, y = mean)) +
    geom_line(aes(col = drug)) +
    geom_point(aes(col = drug)) +
    ggtitle("Profile Plot") +
    scale_y_continuous(name = "Response") +
    scale_x_discrete(name = "Time")
gg_profile

## Fit model
heart_mod <- lm(cbind(y1, y2, y3, y4) ~ drug, data = heart)
man_fit <- car::Anova(heart_mod)
summary(man_fit)
#> 
#> Type II MANOVA Tests:
#> 
#> Sum of squares and products for error:
#>        y1      y2      y3     y4
#> y1 641.00 601.750 535.250 426.00
#> y2 601.75 823.875 615.500 534.25
#> y3 535.25 615.500 655.875 555.25
#> y4 426.00 534.250 555.250 674.50
#> 
#> ------------------------------------------
#>  
#> Term: drug 
#> 
#> Sum of squares and products for the hypothesis:
#>        y1       y2       y3    y4
#> y1 567.00 335.2500  42.7500 387.0
#> y2 335.25 569.0833 404.5417 367.5
#> y3  42.75 404.5417 391.0833 171.0
#> y4 387.00 367.5000 171.0000 316.0
#> 
#> Multivariate Tests: drug
#>                  Df test stat  approx F num Df den Df     Pr(>F)    
#> Pillai            2  1.283456  8.508082      8     38 1.5010e-06 ***
#> Wilks             2  0.079007 11.509581      8     36 6.3081e-08 ***
#> Hotelling-Lawley  2  7.069384 15.022441      8     34 3.9048e-09 ***
#> Roy               2  6.346509 30.145916      4     19 5.4493e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

reject the null hypothesis of no difference in means between treatments

## Contrasts
heart$drug <- factor(heart$drug)
L <- matrix(c(0, 2,
              1, -1,-1, -1), nrow = 3, byrow = T)
colnames(L) <- c("bww9:ctrl", "ax23:rest")
rownames(L) <- unique(heart$drug)
contrasts(heart$drug) <- L
contrasts(heart$drug)
#>      bww9:ctrl ax23:rest
#> ax23         0         2
#> bww9         1        -1
#> ctrl        -1        -1

# do not set contrast L if you do further analysis (e.g., Anova, lm)
# do M matrix instead

M <- matrix(c(1, -1, 0, 0,
              0, 1, -1, 0,
              0, 0, 1, -1), nrow = 4)
## update model to test contrasts
heart_mod2 <- update(heart_mod)
coef(heart_mod2)
#>                  y1         y2        y3    y4
#> (Intercept)   75.00 78.9583333 77.041667 74.75
#> drugbww9:ctrl  4.50  5.8125000  3.562500  4.25
#> drugax23:rest -2.25  0.7708333  1.979167 -0.75

# Hypothesis test for bww9 vs control after transformation M
# same as linearHypothesis(heart_mod, hypothesis.matrix = c(0,1,-1), P = M)
bww9vctrl <-
    car::linearHypothesis(heart_mod2,
                     hypothesis.matrix = c(0, 1, 0),
                     P = M)
bww9vctrl
#> 
#>  Response transformation matrix:
#>    [,1] [,2] [,3]
#> y1    1    0    0
#> y2   -1    1    0
#> y3    0   -1    1
#> y4    0    0   -1
#> 
#> Sum of squares and products for the hypothesis:
#>          [,1]   [,2]     [,3]
#> [1,]  27.5625 -47.25  14.4375
#> [2,] -47.2500  81.00 -24.7500
#> [3,]  14.4375 -24.75   7.5625
#> 
#> Sum of squares and products for error:
#>          [,1]     [,2]    [,3]
#> [1,]  261.375 -141.875  28.000
#> [2,] -141.875  248.750 -19.375
#> [3,]   28.000  -19.375 219.875
#> 
#> Multivariate Tests: 
#>                  Df test stat approx F num Df den Df Pr(>F)
#> Pillai            1 0.2564306 2.184141      3     19 0.1233
#> Wilks             1 0.7435694 2.184141      3     19 0.1233
#> Hotelling-Lawley  1 0.3448644 2.184141      3     19 0.1233
#> Roy               1 0.3448644 2.184141      3     19 0.1233

bww9vctrl <-
    car::linearHypothesis(heart_mod,
                     hypothesis.matrix = c(0, 1, -1),
                     P = M)
bww9vctrl
#> 
#>  Response transformation matrix:
#>    [,1] [,2] [,3]
#> y1    1    0    0
#> y2   -1    1    0
#> y3    0   -1    1
#> y4    0    0   -1
#> 
#> Sum of squares and products for the hypothesis:
#>          [,1]   [,2]     [,3]
#> [1,]  27.5625 -47.25  14.4375
#> [2,] -47.2500  81.00 -24.7500
#> [3,]  14.4375 -24.75   7.5625
#> 
#> Sum of squares and products for error:
#>          [,1]     [,2]    [,3]
#> [1,]  261.375 -141.875  28.000
#> [2,] -141.875  248.750 -19.375
#> [3,]   28.000  -19.375 219.875
#> 
#> Multivariate Tests: 
#>                  Df test stat approx F num Df den Df Pr(>F)
#> Pillai            1 0.2564306 2.184141      3     19 0.1233
#> Wilks             1 0.7435694 2.184141      3     19 0.1233
#> Hotelling-Lawley  1 0.3448644 2.184141      3     19 0.1233
#> Roy               1 0.3448644 2.184141      3     19 0.1233

there is no significant difference in means between the control and bww9 drug

# Hypothesis test for ax23 vs rest after transformation M
axx23vrest <-
    car::linearHypothesis(heart_mod2,
                     hypothesis.matrix = c(0, 0, 1),
                     P = M)
axx23vrest
#> 
#>  Response transformation matrix:
#>    [,1] [,2] [,3]
#> y1    1    0    0
#> y2   -1    1    0
#> y3    0   -1    1
#> y4    0    0   -1
#> 
#> Sum of squares and products for the hypothesis:
#>           [,1]       [,2]      [,3]
#> [1,]  438.0208  175.20833 -395.7292
#> [2,]  175.2083   70.08333 -158.2917
#> [3,] -395.7292 -158.29167  357.5208
#> 
#> Sum of squares and products for error:
#>          [,1]     [,2]    [,3]
#> [1,]  261.375 -141.875  28.000
#> [2,] -141.875  248.750 -19.375
#> [3,]   28.000  -19.375 219.875
#> 
#> Multivariate Tests: 
#>                  Df test stat approx F num Df den Df     Pr(>F)    
#> Pillai            1  0.855364 37.45483      3     19 3.5484e-08 ***
#> Wilks             1  0.144636 37.45483      3     19 3.5484e-08 ***
#> Hotelling-Lawley  1  5.913921 37.45483      3     19 3.5484e-08 ***
#> Roy               1  5.913921 37.45483      3     19 3.5484e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

axx23vrest <-
    car::linearHypothesis(heart_mod,
                     hypothesis.matrix = c(2, -1, 1),
                     P = M)
axx23vrest
#> 
#>  Response transformation matrix:
#>    [,1] [,2] [,3]
#> y1    1    0    0
#> y2   -1    1    0
#> y3    0   -1    1
#> y4    0    0   -1
#> 
#> Sum of squares and products for the hypothesis:
#>           [,1]       [,2]      [,3]
#> [1,]  402.5208  127.41667 -390.9375
#> [2,]  127.4167   40.33333 -123.7500
#> [3,] -390.9375 -123.75000  379.6875
#> 
#> Sum of squares and products for error:
#>          [,1]     [,2]    [,3]
#> [1,]  261.375 -141.875  28.000
#> [2,] -141.875  248.750 -19.375
#> [3,]   28.000  -19.375 219.875
#> 
#> Multivariate Tests: 
#>                  Df test stat approx F num Df den Df     Pr(>F)    
#> Pillai            1  0.842450 33.86563      3     19 7.9422e-08 ***
#> Wilks             1  0.157550 33.86563      3     19 7.9422e-08 ***
#> Hotelling-Lawley  1  5.347205 33.86563      3     19 7.9422e-08 ***
#> Roy               1  5.347205 33.86563      3     19 7.9422e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

there is a significant difference in means between ax23 drug treatment and the rest of the treatments

22.1.2 Profile Analysis

Examine similarities between the treatment effects (between subjects), which is useful for longitudinal analysis. Null is that all treatments have the same average effect.

\[ H_0: \mu_1 = \mu_2 = \dots = \mu_h \]

Equivalently,

\[ H_0: \tau_1 = \tau_2 = \dots = \tau_h \]

The exact nature of the similarities and differences between the treatments can be examined under this analysis.

Sequential steps in profile analysis:

  1. Are the profiles parallel? (i.e., is there no interaction between treatment and time)
  2. Are the profiles coincidental? (i.e., are the profiles identical?)
  3. Are the profiles horizontal? (i.e., are there no differences between any time points?)

If we reject the null hypothesis that the profiles are parallel, we can test

  • Are there differences among groups within some subset of the total time points?

  • Are there differences among time points in a particular group (or groups)?

  • Are there differences within some subset of the total time points in a particular group (or groups)?

Example

  • 4 times (p = 4)

  • 3 treatments (h=3)

22.1.2.1 Parallel Profile

Are the profiles for each population identical expect for a mean shift?

\[ \begin{aligned} H_0: \mu_{11} - \mu_{21} - \mu_{12} - \mu_{22} = &\dots = \mu_{1t} - \mu_{2t} \\ \mu_{11} - \mu_{31} - \mu_{12} - \mu_{32} = &\dots = \mu_{1t} - \mu_{3t} \\ &\dots \end{aligned} \]

for \(h-1\) equations

Equivalently,

\[ H_0: \mathbf{LBM = 0} \]

\[ \mathbf{LBM} = \left[ \begin{array} {ccc} 1 & -1 & 0 \\ 1 & 0 & -1 \end{array} \right] \left[ \begin{array} {ccc} \mu_{11} & \dots & \mu_{14} \\ \mu_{21} & \dots & \mu_{24} \\ \mu_{31} & \dots & \mu_{34} \end{array} \right] \left[ \begin{array} {ccc} 1 & 1 & 1 \\ -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{array} \right] = \mathbf{0} \]

where this is the cell means parameterization of \(\mathbf{B}\)

The multiplication of the first 2 matrices \(\mathbf{LB}\) is

\[ \left[ \begin{array} {cccc} \mu_{11} - \mu_{21} & \mu_{12} - \mu_{22} & \mu_{13} - \mu_{23} & \mu_{14} - \mu_{24}\\ \mu_{11} - \mu_{31} & \mu_{12} - \mu_{32} & \mu_{13} - \mu_{33} & \mu_{14} - \mu_{34} \end{array} \right] \]

which is the differences in treatment means at the same time

Multiplying by \(\mathbf{M}\), we get the comparison across time

\[ \left[ \begin{array} {ccc} (\mu_{11} - \mu_{21}) - (\mu_{12} - \mu_{22}) & (\mu_{11} - \mu_{21}) -(\mu_{13} - \mu_{23}) & (\mu_{11} - \mu_{21}) - (\mu_{14} - \mu_{24}) \\ (\mu_{11} - \mu_{31}) - (\mu_{12} - \mu_{32}) & (\mu_{11} - \mu_{31}) - (\mu_{13} - \mu_{33}) & (\mu_{11} - \mu_{31}) -(\mu_{14} - \mu_{34}) \end{array} \right] \]

Alternatively, we can also use the effects parameterization

\[ \mathbf{LBM} = \left[ \begin{array} {cccc} 0 & 1 & -1 & 0 \\ 0 & 1 & 0 & -1 \end{array} \right] \left[ \begin{array} {c} \mu' \\ \tau'_1 \\ \tau_2' \\ \tau_3' \end{array} \right] \left[ \begin{array} {ccc} 1 & 1 & 1 \\ -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{array} \right] = \mathbf{0} \]

In both parameterizations, \(rank(\mathbf{L}) = h-1\) and \(rank(\mathbf{M}) = p-1\)

We could also choose \(\mathbf{L}\) and \(\mathbf{M}\) in other forms

\[ \mathbf{L} = \left[ \begin{array} {cccc} 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & -1 \end{array} \right] \]

and

\[ \mathbf{M} = \left[ \begin{array} {ccc} 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \\ 0 & 0 & -1 \end{array} \right] \]

and still obtain the same result.

22.1.2.2 Coincidental Profiles

After we have evidence that the profiles are parallel (i.e., fail to reject the parallel profile test), we can ask whether they are identical?

Given profiles are parallel, then if the sums of the components of \(\mu_i\) are identical for all the treatments, then the profiles are identical.

\[ H_0: \mathbf{1'}_p \mu_1 = \mathbf{1'}_p \mu_2 = \dots = \mathbf{1'}_p \mu_h \]

Equivalently,

\[ H_0: \mathbf{LBM} = \mathbf{0} \]

where for the cell means parameterization

\[ \mathbf{L} = \left[ \begin{array} {ccc} 1 & 0 & -1 \\ 0 & 1 & -1 \end{array} \right] \]

and

\[ \mathbf{M} = \left[ \begin{array} {cccc} 1 & 1 & 1 & 1 \end{array} \right]' \]

multiplication yields

\[ \left[ \begin{array} {c} (\mu_{11} + \mu_{12} + \mu_{13} + \mu_{14}) - (\mu_{31} + \mu_{32} + \mu_{33} + \mu_{34}) \\ (\mu_{21} + \mu_{22} + \mu_{23} + \mu_{24}) - (\mu_{31} + \mu_{32} + \mu_{33} + \mu_{34}) \end{array} \right] = \left[ \begin{array} {c} 0 \\ 0 \end{array} \right] \]

Different choices of \(\mathbf{L}\) and \(\mathbf{M}\) can yield the same result

22.1.2.3 Horizontal Profiles

Given that we can’t reject the null hypothesis that all \(h\) profiles are the same, we can ask whether all of the elements of the common profile equal? (i.e., horizontal)

\[ H_0: \mathbf{LBM} = \mathbf{0} \]

\[ \mathbf{L} = \left[ \begin{array} {ccc} 1 & 0 & 0 \end{array} \right] \]

and

\[ \mathbf{M} = \left[ \begin{array} {ccc} 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \\ 0 & 0 & -1 \end{array} \right] \]

hence,

\[ \left[ \begin{array} {ccc} (\mu_{11} - \mu_{12}) & (\mu_{12} - \mu_{13}) & (\mu_{13} + \mu_{14}) \end{array} \right] = \left[ \begin{array} {ccc} 0 & 0 & 0 \end{array} \right] \]

Note:

  • If we fail to reject all 3 hypotheses, then we fail to reject the null hypotheses of both no difference between treatments and no differences between traits.
Test Equivalent test for
Parallel profile Interaction
Coincidental profile main effect of between-subjects factor
Horizontal profile main effect of repeated measures factor
profile_fit <-
    pbg(
        data = as.matrix(heart[, 2:5]),
        group = as.matrix(heart[, 1]),
        original.names = TRUE,
        profile.plot = FALSE
    )
summary(profile_fit)
#> Call:
#> pbg(data = as.matrix(heart[, 2:5]), group = as.matrix(heart[, 
#>     1]), original.names = TRUE, profile.plot = FALSE)
#> 
#> Hypothesis Tests:
#> $`Ho: Profiles are parallel`
#>   Multivariate.Test Statistic  Approx.F num.df den.df      p.value
#> 1             Wilks 0.1102861 12.737599      6     38 7.891497e-08
#> 2            Pillai 1.0891707  7.972007      6     40 1.092397e-05
#> 3  Hotelling-Lawley 6.2587852 18.776356      6     36 9.258571e-10
#> 4               Roy 5.9550887 39.700592      3     20 1.302458e-08
#> 
#> $`Ho: Profiles have equal levels`
#>             Df Sum Sq Mean Sq F value  Pr(>F)   
#> group        2  328.7  164.35   5.918 0.00915 **
#> Residuals   21  583.2   27.77                   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> $`Ho: Profiles are flat`
#>          F df1 df2      p-value
#> 1 14.30928   3  19 4.096803e-05
# reject null hypothesis of parallel profiles
# reject the null hypothesis of coincidental profiles
# reject the null hypothesis that the profiles are flat

22.1.3 Summary