C.1 6. Implementation: Practical Guidelines

C.1.1 6.1 Constructing Clusters

Use one or more of the following:

  • Geographic proximity.
  • Embedding techniques (e.g., Word2Vec on search sequences).
  • Co-view or co-search frequency.
  • Pricing similarity or estimated cross-price elasticities.

C.1.2 6.2 Evaluating Cluster Quality

Use proxy interference matrix $P$ (e.g., PDP view co-occurrence), then:

Q <- function(B, clusters) {
  sum(sapply(1:nrow(B), function(i) {
    sum(B[i, clusters != clusters[i]])
  }))
}

Lower Q indicates better clustering.

C.1.3 6.3 Trade-offs

Factor Individual Randomization Cluster Randomization
Bias (Interference) High Low
Power High Lower
Complexity Low Moderate
Implementation overhead Minimal Requires new infrastructure
set.seed(123)
n <- 1000
clusters <- rep(1:100, each=10)
Z <- rbinom(n, 1, 0.5)
B <- matrix(runif(n^2, 0, 0.1), n, n)
diag(B) <- runif(n, 0.5, 1)

# Simulate outcome
Y <- sapply(1:n, function(i) sum(B[i,] * Z)) + rnorm(n)

# Estimate TATE under individual randomization
t_ind <- mean(Y[Z==1]) - mean(Y[Z==0])

# Estimate TATE under cluster randomization
cluster_Z <- tapply(Z, clusters, mean)
cluster_Y <- tapply(Y, clusters, mean)
t_cr <- mean(cluster_Y[cluster_Z > 0.5]) - mean(cluster_Y[cluster_Z <= 0.5])

t_ind
#> [1] 0.7931955
t_cr
#> [1] 0.3104478

Observe that tind is inflated relative to tcr.