Chapter 13 Factorial Analysis

13.1 Factorial Between-Groups ANOVA

The one-way between-groups ANOVA design handles situations where there is one independent variable – also known as a factor – that has three or more levels (when there is one factor that has one or two levels, we use t-tests). Often, we are interested in the effects of more than one independent variable factor at a time.

For example, let’s imagine the case where a social psychologist is interested in how the independent variables mood and social situationand the combination of mood and social situation impact stress levels. To manipulate the mood factor, participants are randomly assigned to watch one of three brief video clips that have been shown (and validated) to induce positive, neutral, or negative moods, respectively.

Pictured L-R: Selected Participants in the Positive, Neutral, and Negative Induced-Mood ConditionsPictured L-R: Selected Participants in the Positive, Neutral, and Negative Induced-Mood ConditionsPictured L-R: Selected Participants in the Positive, Neutral, and Negative Induced-Mood Conditions

Figure 13.1: Pictured L-R: Selected Participants in the Positive, Neutral, and Negative Induced-Mood Conditions

To manipulate the social situation factor, participants are randomly assigned to one of two conditions: a setting populated with actors who are trained to behave in a supportive manner, or a setting populated with actors trained to act antagonistically.

Pictured L-R: Selected Participants in the Supportive and in the Antagonistic Social-Situation ConditionsPictured L-R: Selected Participants in the Supportive and in the Antagonistic Social-Situation Conditions

Figure 13.2: Pictured L-R: Selected Participants in the Supportive and in the Antagonistic Social-Situation Conditions

Participants are each shown one (and only one) of the mood-inducing videos and then brought to a room with actors performing either the supportive or the antagonistic roles (and not both); the participants interact with the actors for 15 minutes. The dependent variable for this experience – stress – is measured by levels of cortisol in saliva samples taken following the social interaction.

In this example, stress and social situation are between-groups factors: as noted, each participant sees one and only one of the three videos and interacts with one and only one of the two lab-manufactured situations (if participants saw all the videos, then stress would be a within-groups factor; similarly, if participants experiences both social situations, then social situation would be a within-groups factor). That means that there are six possible combinations of the factors – 3 levels of the stress variable times 2 levels of the social situation variable (see Table 13.1), and therefore this experiment requires six totally different groups of participants in each combination of the two factors in order to be a totally between-groups factorial experiment, and every participant experiences one and only one combination of the two factors.

Any time there are multiple factors – even when one or more of the factors has only two levels – we use factorial ANOVA. and t is a special case of ANOVA anyway.

In these cases, we use a Factorial ANOVA

13.1.0.1 Main Effects and Interactions

Main effects are the observed effects of each factor by themselves. Interactions are effects present when multiple treatments are given that are different than the effects present for those treatments applied by themselves.

An effect is called a main effect only if it is significant (otherwise, there is no observed effect). In the example above, a main effect of mood would indicate that the mood factor causes significant changes in the dependent variable; a main effect of situation would indicate that the situation factor causes significant changes in the dependent variable. As with main effects, an interaction is technically only really an interaction if it is statistically significant (but that rule is a lot softer because we don’t have great ways to talk about non-significant factor combinations). Interactions indicate that different combinations of the two factors cause variance in the dependent variable that is independent of the influence of the two factors by themselves.

In an ANOVA analysis, by the strict definition of ANOVA that holds that the n is the same in all groups, factors and interactions are each sources of unique variance in the dependent variable; that is to say, they are orthogonal.

When n is not uniform across groups, then factors and interactions are not orthogonal and therefore do not contribute unique variance in the dependent variable: the variance associated with different sources overlaps. Those are cases in which we use general linear models (GLMs) to analyze the data, and different types of estimates for the sums of squares (types I, II, & III).

But that is a concern for another chapter

The key reason to use factorial ANOVA is that we are interested in interactions. It is perfectly good and natural for a scientist to be interested in more than one thing at the same time, and sometimes those things might be measurable by the same quantity. For example: a neuroscientist studying the amygdala might be interested in how different stimuli, time of day, personal experiences, ambient temperature, age, gender identification, and all kinds of other factors influence amygdala activation. It might be tempting to test different factors in the same experiment, but researchers shouldn’t do so unless they are interested in how those factors interact with each other. That doesn’t mean that a factorial experiment is ill-advised if it does not result in a significant interaction – that, of course, is why we try things in science – but only that there needs to be interest in examining if there is an interaction to justify the design. The reason for that is factorial designs require substantial resources to run. In a factorial design, one needs equal numbers of observations for each combination of factors. Thus, the amount of factors either multiplies the number of participants you need (for between-subjects factors) or divides the number of observations in each condition per participant (for within-subjects factors). For example, a two-way independent-groups design with 3 factor levels for each factor requires 9 total groups to cover each of the 3×3 combinations; if the interaction is not of interest, two experiments using one-way independent groups designs would require just 6 groups of participants. In addition to efficiency, interpreting and explaining interactions is complex enough when the interaction is meaningful; attempting to explain an observed interaction for things for which an interaction is irrelevant would be considerably more difficult.

13.1.1 Between-Groups 2-way Factorial ANOVA Model

yijk=μ+αj+βk+αβjk+ϵijk

yijk is an observed value of the DV

μ is the population mean

αj is the effect of factor A at each level ** j**

βk is the effect of factor B at each level ** k**

αβjk is the interaction of factors A and B at each level combination jk

ϵijk is the error (or residuals)

13.1.1.0.1 Hypotheses

We have three null and three alternative hypotheses for the factorial 2-way IG model:

H0:σ2α=0;H1:σ2α0

H0:σ2β=0;H1:σ2β0

H0:σ2αβ=0;H1:σ2αβ0

13.1.1.0.2 Factorial ANOVA Table

The F-ratios we use to evaluate the hypotheses for Factor A and Factor B depend on whether the factors are fixed or random (the AB interaction is F=MSAB/MSe regardless).

Note: An interaction is random if at least one of its constituent factors is random. Otherwise it’s fixed.

The expected mean squares serve as a guide to what the ** F-statistic denominator should be**.

Two key terms that are new to us are ** j/J** and ** k/K**

The lower case j and k refer, respectively, to the number of levels of Factor A and Factor B that we have.

The upper case J and K refer to the number of levels that exist.

If Factor A (for example) is fixed, then the number of levels we have is the same (or about the same) as the number of levels that exist and therefore j/J1

If Factor A is random, then the number of levels we have is much smaller than the number if levels that exist and therefore j/J0

The EMS for Factor B in the 2-Way IG Design is:

nσ2β+n(1jJ)σ2αβ+σ2ϵ If Factor A* is fixed, and jJ1, then 1jJ0 and:

EMSB=nσ2β+σ2ϵ If Factor A* is random, and jJ0, then 1jJ1 and:

EMSB=nσ2β+nσ2αβ+σ2ϵ .footnote[ *the EMS for Factor B depends on whether Factor A is fixed, and yes, that is wild]

Therefore, if Factor A is fixed and EMSB=nσ2β+σ2ϵ, then ** MSe is the appropriate error term for the F-ratio**.

If Factor A is random and EMSB=nσ2β+nσ2αβ+σ2ϵ, then ** MSAB is the appropriate error term for the F-ratio**.

13.1.1.0.3 Between-Groups 2-Way ANOVA Example

Three levels of Factor A

Two levels of Factor B

n=2

NOTE: now n is the number per combination of A and B (a.k.a. the number per cell)

The aov() output gives us the appropriate F-ratios for fixed A and B.

summary(aov(obs_data~AFac*BFac, data = IG2way4aov))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## AFac         2  28.50   14.25   4.071 0.0764 .
## BFac         1  21.33   21.33   6.095 0.0485 *
## AFac:BFac    2  56.17   28.08   8.024 0.0202 *
## Residuals    6  21.00    3.50                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(obs_data~AFac*BFac, data = IG2way4aov))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## AFac         2  28.50   14.25   4.071 0.0764 .
## BFac         1  21.33   21.33   6.095 0.0485 *
## AFac:BFac    2  56.17   28.08   8.024 0.0202 *
## Residuals    6  21.00    3.50                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If Factor A is random, then FB=MSB/MSAB=0.76, and p>0.05:

pf(0.76, df1 = 1, df2 = 2, lower.tail = FALSE)
## [1] 0.4752502
summary(aov(obs_data~AFac*BFac, data = IG2way4aov))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## AFac         2  28.50   14.25   4.071 0.0764 .
## BFac         1  21.33   21.33   6.095 0.0485 *
## AFac:BFac    2  56.17   28.08   8.024 0.0202 *
## Residuals    6  21.00    3.50                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If Factor B is random, then FA=MSA/MSAB=0.51, and p>0.05:

pf(0.51, df1 = 2, df2 = 2, lower.tail = FALSE)
## [1] 0.6622517
13.1.1.0.4 Effect Size Estimates

.slightly-smaller[ Assuming Factors A and B are fixed, we found:

no main effect of Factor A

a main effect of Factor B

an interaction between A and B (a.k.a. an AB interaction)

We could calculate the overall effect size of the model η2, similarly to how we calculated η2 for the one-way design:

η2=SSmodelSStotal=SStotalSSeSStotal=127211270.84 However, it is more meaningful to report effect-size estimates for each individual effect (main or interaction); these are called partial η2 values

13.1.1.0.5 Partial η2

Partial η2 estimates are the ** SS for each observed effect** as a proportion of the total SS

η2A=no thank you

η2B=SSBSStotal=21.33127=0.17 η2AB=SSABSStotal=56.17127=0.44

13.1.1.0.6 Full and Partial ω2

Pretty much the same deal with ω2: we can calculate an ω2 for the whole model, which includes all the variance components (even the ones associated with non-significant effects) in the calculation:

ω2=^σ2α+^σ2β+^σ2αβ^σ2α+^σ2β+^σ2αβ+^σ2ϵ But also - and more meaningfully, partial ω2 estimates:

ω2α ω2β ω2αβ
^σ2α^σ2α+^σ2β+^σ2αβ+^σ2ϵ ^σ2β^σ2α+^σ2β+^σ2αβ+^σ2ϵ ^σ2αβ^σ2α+^σ2β+^σ2αβ+^σ2ϵ
13.1.1.0.7 Estimating Pop. Variance Components

Still assuming all factors are fixed (so we will have to correct).

As in the one-way design, ** MSe** estimates ** ^σ2ϵ**.

^σ2ϵ=MSe=3.5

** MSAB** estimates ** nσ2αβ+σ2ϵ**

^σ2αβ=MSABMSen(dfABdfAB+1)=28.093.52(23)=8.63

Note that we’re starting at the bottom of the EMS column and working our way up. We don’t have to do it that way, but it makes things marginally easier.

** MSB** estimates ** njσ2β+σ2ϵ**

^σ2β=MSBMSenj(dfBdfB+1)=4.0713.504(12)=0.71 ** MSA** estimates ** nkσ2α+σ2ϵ**

^σ2α=MSAMSenk(dfAdfA+1)=6.093.506(12)=0.38

ω2=^σ2α+^σ2β+^σ2αβ^σ2α+^σ2β+^σ2αβ+^σ2ϵ=0.38+0.71+8.630.38+0.71+8.63+3.5=0.74 ω2β=^σ2β^σ2α+^σ2β+^σ2αβ+^σ2ϵ=0.710.38+0.71+8.63+3.5=0.05 ω2αβ=^σ2αβ^σ2α+^σ2β+^σ2αβ+^σ2ϵ=8.630.38+0.71+8.63+3.5=0.65

  1. The numerator of a ˆσ calculation always takes the form ** MSEFFECTERROR TERM**.

  2. If any ^σ2<0, we just set it to zero.

13.1.1.0.8 Post Hoc Tests

Example: Tukey HSD

TukeyHSD(aov(obs_data~AFac*BFac, data = IG2way4aov))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = obs_data ~ AFac * BFac, data = IG2way4aov)
## 
## $AFac
##        diff       lwr       upr     p adj
## A2-A1  1.50 -2.558946 5.5589455 0.5301028
## A3-A1 -2.25 -6.308946 1.8089455 0.2798018
## A3-A2 -3.75 -7.808946 0.3089455 0.0668375
## 
## $BFac
##           diff    lwr      upr     p adj
## B2-B1 2.666667 0.0237 5.309633 0.0485331
## 
## $`AFac:BFac`
##             diff         lwr       upr     p adj
## A2:B1-A1:B1 -0.5  -7.9456115  6.945611 0.9997017
## A3:B1-A1:B1  1.0  -6.4456115  8.445611 0.9922312
## A1:B2-A1:B1  3.5  -3.9456115 10.945611 0.4924408
## A2:B2-A1:B1  7.0  -0.4456115 14.445611 0.0645128
## A3:B2-A1:B1 -2.0  -9.4456115  5.445611 0.8775797
## A3:B1-A2:B1  1.5  -5.9456115  8.945611 0.9569571
## A1:B2-A2:B1  4.0  -3.4456115 11.445611 0.3774592
## A2:B2-A2:B1  7.5   0.0543885 14.945611 0.0484867
## A3:B2-A2:B1 -1.5  -8.9456115  5.945611 0.9569571
## A1:B2-A3:B1  2.5  -4.9456115  9.945611 0.7596584
## A2:B2-A3:B1  6.0  -1.4456115 13.445611 0.1161995
## A3:B2-A3:B1 -3.0 -10.4456115  4.445611 0.6242112
## A2:B2-A1:B2  3.5  -3.9456115 10.945611 0.4924408
## A3:B2-A1:B2 -5.5 -12.9456115  1.945611 0.1567749
## A3:B2-A2:B2 -9.0 -16.4456115 -1.554389 0.0215070

Example: Tukey HSD (continued)

TukeyHSD(aov(obs_data~AFac*BFac, data = IG2way4aov))$`AFac:BFac`
##             diff         lwr       upr      p adj
## A2:B1-A1:B1 -0.5  -7.9456115  6.945611 0.99970168
## A3:B1-A1:B1  1.0  -6.4456115  8.445611 0.99223117
## A1:B2-A1:B1  3.5  -3.9456115 10.945611 0.49244079
## A2:B2-A1:B1  7.0  -0.4456115 14.445611 0.06451283
## A3:B2-A1:B1 -2.0  -9.4456115  5.445611 0.87757971
## A3:B1-A2:B1  1.5  -5.9456115  8.945611 0.95695711
## A1:B2-A2:B1  4.0  -3.4456115 11.445611 0.37745922
## A2:B2-A2:B1  7.5   0.0543885 14.945611 0.04848665
## A3:B2-A2:B1 -1.5  -8.9456115  5.945611 0.95695711
## A1:B2-A3:B1  2.5  -4.9456115  9.945611 0.75965842
## A2:B2-A3:B1  6.0  -1.4456115 13.445611 0.11619948
## A3:B2-A3:B1 -3.0 -10.4456115  4.445611 0.62421123
## A2:B2-A1:B2  3.5  -3.9456115 10.945611 0.49244079
## A3:B2-A1:B2 -5.5 -12.9456115  1.945611 0.15677488
## A3:B2-A2:B2 -9.0 -16.4456115 -1.554389 0.02150697
13.1.1.0.9 Nonparametric 2-Way ANOVA

Rank-based methods have recently been developed for 2-way ANOVA

raov(obs_data~AFac*BFac, data = IG2way4aov)
## 
## Robust ANOVA Table
##           DF       RD Mean RD       F p-value
## AFac       2  8.14081 4.07040 3.68008 0.09058
## BFac       1  6.03023 6.03023 5.45198 0.05825
## AFac:BFac  2 14.77406 7.38703 6.67867 0.02978
13.1.1.0.10 Bayesian 2-Way ANOVA
anovaBF(obs_data~AFac*BFac, data = IG2way4aov, progress = FALSE)
## Bayes factor analysis
## --------------
## [1] AFac                    : 0.6548882 ±0.01%
## [2] BFac                    : 0.8578258 ±0%
## [3] AFac + BFac             : 0.6218483 ±0.67%
## [4] AFac + BFac + AFac:BFac : 2.068302  ±1.09%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS
13.1.1.0.11 IG 3-Way ANOVA

For reasons outlined above (cost, complexity, interpretability), the 3-way ANOVA is about as complex a design as you should use.

There are seven null and seven alternative hypotheses for the 3-Way IG design:

Three Main effects: A (σ2α), B (σ2β), and C (σ2γ)

Four Interactions: AB (σ2αβ), BC (σ2βγ), AC (σ2αγ), and ABC (σ2αβγ)

All concepts and practices for the 2-way design apply to the3-way design.

But, there is one new thing that we need to review: the Quasi-F ratio

13.1.1.0.12 Quasi- F ratios

A Quasi-F ratio is used when there is no appropriate pair of mean squares that makes an F that can test an H0.

On the next slide lives the almost unbearably beautiful expected mean squares table for the IG 3-Way design, with sources listed by their effect name from the model statement for this design:

yijkl=μ+αj+βk+γl+αβjk+βγkl+αγjl+αβγjkl+ϵijkl

Please note the EMS for Factor A, Factor B, and Factor C

If they are random effects (and thus none of the terms drop out), then there are no EMS terms that could isolate σ2α, σ2β, or σ2γ.

.slightly-smaller[ |Effect | EMS | |——-|——| | αj | σ2ϵ+n(1kK)(1lL)σ2αβγ+nk(1lL)σ2αγ+n(1kK)lσ2αβ+nklσ2α | | βk | σ2ϵ+n(1jJ)(1lL)σ2αβγ+nj(1lL)σ2βγ+n(1jJ)lσ2αβ+njrσ2β | | γl | σ2ϵ+n(1jJ)(1kK)σ2αβγ+nj(1kK)σ2βγ+n(1jJ)kσ2αγ+njkσ2γ | | (αβ)jk | σ2ϵ+n(1lL)σ2αβγ+nlσ2αβ| | (αγ)jl | σ2ϵ+n(1kK)σ2αβγ+nkσ2αγ | | (βγ)kl | σ2ϵ+n(1jJ)σ2αβγ+npσ2βγ | | (αβγ)jkl | σ2ϵ+nσ2αβγ | | ϵi(jkl) | σ2ϵ

If (at least) B and C are random, then the EMS for Factor A are:

** σ2ϵ+nσ2αβγ+nkσ2αγ+nlσ2αβ+ ** nklσ2α

There is no single match for any of the highlighted terms (which we need to isolate nklσ2α).

We have to assemble a proper error term from bits and pieces of different EMS.

13.1.1.0.13 Quasi- F ratios: F

EMSA= ** σ2ϵ+nσ2αβγ+nkσ2αγ+nlσ2αβ+ ** nklσ2α

We can get the terms ** nσ2αβγ** and ** nlσ2αβ** from ** EMSAB**

We can get the term ** nkσ2αγ** from ** EMSAC**

** EMSAB+EMSAC=σ2ϵ+nσ2αβγ+nrσ2αβ** +σ2ϵ+nσ2αβγ+ ** nqσ2αγ**

Almost there! but we have an extra σ2ϵ+nσ2αβγ that we don’t need. Lucky for us, that happens to be exactly the EMS for the ABC Interaction!

EMSAnklσ2α= ** EMSAB+EMSACEMSABC**

The first Quasi- F ratio F translates that EMS math into an F based on mean squares:

F=MSAMSAB+MSACMSABC

Because we messed with the F-ratio denominator, we also have to mess with the denominator degrees of freedom:

dfdenom=(MSAB+MSACMSABC)2MS2ABdfAB+MS2ACdfAC+MS2ABCdfABC

** F** has a problem: the subtraction in the denominator means that it’s possible to get a negative denominator and therefore a negative F (which makes no sense).

Enter ** F, which messes with both numerator and denominator** to get an approximately correct F-ratio.

F=MSA+MSABCMSAB+MSAC

For F, we adjust df for both numerator and denominator:

dfnum=(MSA+MSABC)2MS2AdfA+MS2ABCdfABC

dfdenom=(MSAB+MSAC)2MS2ABdfAB+MS2ACdfAC

13.2 Four-or-more-way Factorial ANOVA