C.1 6. Implementation: Practical Guidelines
C.1.1 6.1 Constructing Clusters
Use one or more of the following:
- Geographic proximity.
- Embedding techniques (e.g., Word2Vec on search sequences).
- Co-view or co-search frequency.
- Pricing similarity or estimated cross-price elasticities.
C.1.2 6.2 Evaluating Cluster Quality
Use proxy interference matrix $P$ (e.g., PDP view co-occurrence), then:
Q <- function(B, clusters) {
sum(sapply(1:nrow(B), function(i) {
sum(B[i, clusters != clusters[i]])
}))
}
Lower Q indicates better clustering.
C.1.3 6.3 Trade-offs
Factor | Individual Randomization | Cluster Randomization |
---|---|---|
Bias (Interference) | High | Low |
Power | High | Lower |
Complexity | Low | Moderate |
Implementation overhead | Minimal | Requires new infrastructure |
set.seed(123)
n <- 1000
clusters <- rep(1:100, each=10)
Z <- rbinom(n, 1, 0.5)
B <- matrix(runif(n^2, 0, 0.1), n, n)
diag(B) <- runif(n, 0.5, 1)
# Simulate outcome
Y <- sapply(1:n, function(i) sum(B[i,] * Z)) + rnorm(n)
# Estimate TATE under individual randomization
t_ind <- mean(Y[Z==1]) - mean(Y[Z==0])
# Estimate TATE under cluster randomization
cluster_Z <- tapply(Z, clusters, mean)
cluster_Y <- tapply(Y, clusters, mean)
t_cr <- mean(cluster_Y[cluster_Z > 0.5]) - mean(cluster_Y[cluster_Z <= 0.5])
t_ind
#> [1] 0.7931955
t_cr
#> [1] 0.3104478
Observe that tind is inflated relative to tcr.