22.4 Classical Experimental Designs

Experimental designs provide structured frameworks for conducting experiments, ensuring that results are statistically valid and practically applicable. The choice of design depends on the research question, the nature of the treatment, and potential sources of variability. For a more in-depth statistical understanding of these designs, we will revisit them again in Analysis of Variance.

22.4.1 Completely Randomized Design

In a Completely Randomized Design (CRD), each experimental unit is randomly assigned to a treatment group. This is the simplest form of experimental design and is effective when no confounding factors are present.

Example: Email Marketing Experiment

A company tests three different email marketing strategies (A, B, and C) to measure their effect on customer engagement (click-through rate). Customers are randomly assigned to receive one of the three emails.

Mathematical Model

$Y_{ij} = \mu + \tau_i + \epsilon_{ij}$

where:

$Y_{ij}$ is the response variable (e.g., click-through rate).
$\mu$ is the overall mean response.
$\tau_i$ is the effect of treatment $i$ .
$\epsilon_{ij}$ is the random error term, assumed to be normally distributed: $\epsilon_{ij} \sim N(0, \sigma^2)$ .

set.seed(123)

# Simulated dataset for email marketing experiment
data <- data.frame(
  group = rep(c("A", "B", "C"), each = 10),
  response = c(rnorm(10, mean=50, sd=5),
               rnorm(10, mean=55, sd=5),
               rnorm(10, mean=60, sd=5))
)

# ANOVA to test for differences among groups
anova_result <- aov(response ~ group, data = data)
summary(anova_result)
#>             Df Sum Sq Mean Sq F value  Pr(>F)   
#> group        2  306.1  153.04   6.435 0.00518 **
#> Residuals   27  642.1   23.78                   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If the p-value in the ANOVA summary is less than 0.05, we reject the null hypothesis and conclude that at least one email strategy significantly affects engagement.

22.4.2 Randomized Block Design

A Randomized Block Design (RBD) is used when experimental units can be grouped into homogeneous blocks based on a known confounding factor. Blocking helps reduce unwanted variation, increasing the precision of estimated treatment effects.

Example: Store Layout Experiment

A retailer tests three store layouts (A, B, and C) on sales performance. Since store location (Urban, Suburban, Rural) might influence sales, we use blocking to control for this effect.

Mathematical Model $Y_{ij} = \mu + \tau_i + \beta_j + \epsilon_{ij}$ where:

$Y_{ij}$ is the sales outcome for store $i$ in location $j$ .
$\mu$ is the overall mean sales.
$\tau_i$ is the effect of layout $i$ .
$\beta_j$ represents the block effect (location).
$\epsilon_{ij}$ is the random error.

set.seed(123)

# Simulated dataset for store layout experiment
data <- data.frame(
  block = rep(c("Urban", "Suburban", "Rural"), each = 6),
  layout = rep(c("A", "B", "C"), times = 6),
  sales = c(rnorm(6, mean=200, sd=20),
            rnorm(6, mean=220, sd=20),
            rnorm(6, mean=210, sd=20))
)

# ANOVA with blocking factor
anova_block <- aov(sales ~ layout + block, data = data)
summary(anova_block)
#>             Df Sum Sq Mean Sq F value Pr(>F)
#> layout       2     71    35.7   0.071  0.931
#> block        2    328   164.1   0.328  0.726
#> Residuals   13   6500   500.0

By including block in the model, we account for location effects, leading to more accurate treatment comparisons.

22.4.3 Factorial Design

A Factorial Design evaluates the effects of two or more factors simultaneously, allowing for the study of interactions between variables.

Example: Pricing and Advertising Experiment

A company tests two pricing strategies (High, Low) and two advertising methods (TV, Social Media) on sales.

Mathematical Model $Y_{ijk} = \mu + \tau_i + \gamma_j + (\tau\gamma)_{ij} + \epsilon_{ijk}$

where:

$\tau_i$ is the effect of price level $i$ .
$\gamma_j$ is the effect of advertising method $j$ .
$(\tau\gamma)_{ij}$ is the interaction effect between price and advertising.
$\epsilon_{ijk}$ is the random error term.

set.seed(123)

# Simulated dataset
data <- expand.grid(
  Price = c("High", "Low"),
  Advertising = c("TV", "Social Media"),
  Replicate = 1:10
)

# Generate response variable (sales)
data$Sales <- with(data, 
                   100 + 
                   ifelse(Price == "Low", 10, 0) + 
                   ifelse(Advertising == "Social Media", 15, 0) + 
                   ifelse(Price == "Low" & Advertising == "Social Media", 5, 0) +
                   rnorm(nrow(data), sd=5))

# Two-way ANOVA
anova_factorial <- aov(Sales ~ Price * Advertising, data = data)
summary(anova_factorial)
#>                   Df Sum Sq Mean Sq F value   Pr(>F)    
#> Price              1   1364    1364   66.60 1.05e-09 ***
#> Advertising        1   3640    3640  177.67 1.72e-15 ***
#> Price:Advertising  1     15      15    0.71    0.405    
#> Residuals         36    738      20                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

22.4.4 Crossover Design

A Crossover Design is used when each subject receives multiple treatments in a sequential manner. This design controls for individual differences by using each subject as their own control.

Example: Drug Trial

Patients receive Drug A in the first period and Drug B in the second period, or vice versa.

Mathematical Model $Y_{ijk} = \mu + \tau_i + \pi_j + \beta_k + \epsilon_{ijk}$ where:

$\tau_i$ is the treatment effect.
$\pi_j$ is the period effect (e.g., learning effects).
$\beta_k$ is the subject effect (individual baseline differences).

set.seed(123)

# Simulated dataset
data <- data.frame(
  Subject = rep(1:10, each = 2),
  Period = rep(c("Period 1", "Period 2"), times = 10),
  Treatment = rep(c("A", "B"), each = 10),
  Response = c(rnorm(10, mean=50, sd=5), rnorm(10, mean=55, sd=5))
)

# Crossover ANOVA
anova_crossover <-
  aov(Response ~ Treatment + Period + Error(Subject / Period),
      data = data)
summary(anova_crossover)
#> 
#> Error: Subject
#>           Df Sum Sq Mean Sq
#> Treatment  1  63.94   63.94
#> 
#> Error: Subject:Period
#>        Df Sum Sq Mean Sq
#> Period  1  21.92   21.92
#> 
#> Error: Within
#>           Df Sum Sq Mean Sq F value Pr(>F)  
#> Treatment  1  134.9  134.91   5.231 0.0371 *
#> Period     1    0.2    0.24   0.009 0.9237  
#> Residuals 15  386.9   25.79                 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

22.4.5 Split-Plot Design

A Split-Plot Design is used when one factor is applied at the group (whole-plot) level and another at the individual (sub-plot) level. This design is particularly useful when some factors are harder or more expensive to randomize than others.

Example: Farming Experiment

A farm is testing two irrigation methods (Drip vs. Sprinkler) and two soil types (Clay vs. Sand) on crop yield. Since irrigation systems are installed at the farm level and are difficult to change, they are treated as the whole-plot factor. However, different soil types exist within each farm and can be tested more easily, making them the sub-plot factor.

Mathematical Model

The statistical model for a Split-Plot Design is:

$Y_{ijk} = \mu + \alpha_i + B_k + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}$

where:

$Y_{ijk}$ is the response (e.g., crop yield).
$\mu$ is the overall mean.
$\alpha_i$ is the whole-plot factor (Irrigation method).
$B_k$ is the random block effect (Farm-level variation).
$\beta_j$ is the sub-plot factor (Soil type).
$(\alpha\beta)_{ij}$ is the interaction effect between Irrigation and Soil type.
$\epsilon_{ijk} \sim N(0, \sigma^2)$ represents the random error term.

The key feature of the Split-Plot Design is that the whole-plot factor ( $\alpha_i$ ) is tested against the farm-level variance ( $B_k$ ), while the sub-plot factor ( $\beta_j$ ) is tested against individual variance ( $\epsilon_{ijk}$ ).

We model the Split-Plot Design using a Mixed Effects Model, treating Farm as a random effect to account for variation at the whole-plot level.

set.seed(123)

# Simulated dataset for a split-plot experiment
data <- data.frame(
  Farm = rep(1:6, each = 4), # 6 farms (whole plots)
  
  # Whole-plot factor
  Irrigation = rep(c("Drip", "Sprinkler"), each = 12), 
  Soil = rep(c("Clay", "Sand"), times = 12), # Sub-plot factor
  
  # Response variable
  Yield = c(rnorm(12, mean=30, sd=5), rnorm(12, mean=35, sd=5)) 
)

# Load mixed-effects model library
library(lme4)

# Mixed-effects model: Whole-plot factor (Irrigation) as a random effect
model_split <- lmer(Yield ~ Irrigation * Soil + (1 | Farm), data = data)

# Summary of the model
summary(model_split)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Yield ~ Irrigation * Soil + (1 | Farm)
#>    Data: data
#> 
#> REML criterion at convergence: 128.1
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -1.72562 -0.57572 -0.09767  0.60248  2.04346 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  Farm     (Intercept)  0.00    0.000   
#>  Residual             24.79    4.979   
#> Number of obs: 24, groups:  Farm, 6
#> 
#> Fixed effects:
#>                              Estimate Std. Error t value
#> (Intercept)                    31.771      2.033  15.629
#> IrrigationSprinkler             2.354      2.875   0.819
#> SoilSand                       -1.601      2.875  -0.557
#> IrrigationSprinkler:SoilSand    1.235      4.066   0.304
#> 
#> Correlation of Fixed Effects:
#>             (Intr) IrrgtS SolSnd
#> IrrgtnSprnk -0.707              
#> SoilSand    -0.707  0.500       
#> IrrgtnSp:SS  0.500 -0.707 -0.707
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')

In this model:

Irrigation (Whole-plot factor) is tested against Farm-level variance.
Soil type (Sub-plot factor) is tested against Residual variance.
The interaction between Irrigation × Soil is also evaluated.

This hierarchical structure accounts for the fact that farms are not independent, improving the precision of our estimates.

22.4.6 Latin Square Design

When two potential confounding factors exist, Latin Square Designs provide a structured way to control for these variables. This design is common in scheduling, manufacturing, and supply chain experiments.

Example: Assembly Line Experiment

A manufacturer wants to test three assembly methods (A, B, C) while controlling for work shifts and workstations. Since both shifts and workstations may influence production time, a Latin Square Design ensures that each method is tested once per shift and once per workstation.

Mathematical Model

A Latin Square Design ensures that each treatment appears exactly once in each row and column: $Y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_k + \epsilon_{ijk}$ where:

$Y_{ijk}$ is the outcome (e.g., assembly time).
$\mu$ is the overall mean.
$\alpha_i$ is the treatment effect (Assembly Method).
$\beta_j$ is the row effect (Work Shift).
$\gamma_k$ is the column effect (Workstation).
$\epsilon_{ijk}$ is the random error term.

This ensures that each treatment is equally balanced across both confounding factors.

We implement a Latin Square Design by treating Assembly Method as the primary factor, while controlling for Shifts and Workstations.

set.seed(123)

# Define the Latin Square layout
latin_square <- data.frame(
  Shift = rep(1:3, each = 3), # Rows
  Workstation = rep(1:3, times = 3), # Columns
  
  # Treatments assigned in a balanced way
  Method = c("A", "B", "C", "C", "A", "B", "B", "C", "A"), 
  Time = c(rnorm(3, mean = 30, sd = 3),
           rnorm(3, mean = 28, sd = 3),
           rnorm(3, mean = 32, sd = 3)) # Assembly time
)

# ANOVA for Latin Square Design
anova_latin <-
  aov(Time ~ factor(Shift) + factor(Workstation) + factor(Method),
      data = latin_square)
summary(anova_latin)
#>                     Df Sum Sq Mean Sq F value Pr(>F)
#> factor(Shift)        2  1.148   0.574   0.079  0.927
#> factor(Workstation)  2 24.256  12.128   1.659  0.376
#> factor(Method)       2 14.086   7.043   0.964  0.509
#> Residuals            2 14.619   7.310

If the p-value for “Method” is significant, we conclude that different assembly methods impact production time.
If “Shift” or “Workstation” is significant, it indicates systematic differences across these variables.