22.3 Rerandomization

Since randomization only balances baseline covariates on average, imbalance in variables due to random chance can still happen.
In case that you have a “bad” randomization (i.e., imbalance for important baseline covariates), (Morgan and Rubin 2012) introduce the idea of rerandomization.
Rerandomization is checking balance during the randomization process (before the experiment), to eliminate bad allocation (i.e., those with unacceptable balance).
The greater the number of variables, the greater the likelihood that at least one covariate would be imbalanced across treatment groups.
- Example: For 10 covariates, the probability of a significant difference at $\alpha = .05$ for at least one covariate is $1 - (1-.05)^{10} = 0.4 = 40\%$
Rerandomization increase treatment effect estimate precision if the covariates are correlated with the outcome.
- Improvement in precision for treatment effect estimate depends on (1) improvement in covariate balance and (2) correlation between covariates and the outcome.
You also need to take into account rerandomization into your analysis when making inference.
Rerandomization is equivalent to increasing our sample size.
Alternatives include
- Stratified randomization (Johansson and Schultzberg 2022)
- Matched randomization (Greevy et al. 2004; Kapelner and Krieger 2014)
- Minimization (Pocock and Simon 1975)

Rerandomization Criterion

Acceptable randomization is based on a function of covariate matrix $\mathbf{X}$ and vector of treatment assignments $\mathbf{W}$

$W_i = \begin{cases} 1 \text{ if treated} \\ 0 \text{ if control} \end{cases}$

Mahalanobis Distance, $M$ , can be used as criteria for acceptable balance

Let $M$ be the multivariate distance between groups means

$\begin{aligned} M &= (\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C)' cov(\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C)^{-1} (\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C) \\ &= (\frac{1}{n_T}+ \frac{1}{n_C})^{-1}(\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C)' cov(\mathbf{X})^{-1}(\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C) \end{aligned}$

With large sample size and “pure” randomization $M \sim \chi^2_k$ where $k$ is the number of covariates to be balanced

Then let $p_a$ be the probability of accepting a randomization. Choosing appropriate $p_a$ is a tradeoff between balance and time.

Then the rule of thumb is re-randomize when $M > a$

References

Greevy, Robert, Bo Lu, Jeffrey H Silber, and Paul Rosenbaum. 2004. “Optimal Multivariate Matching Before Randomization.” Biostatistics 5 (2): 263–75.

Johansson, Per, and Mårten Schultzberg. 2022. “Rerandomization: A Complement or Substitute for Stratification in Randomized Experiments?” Journal of Statistical Planning and Inference 218: 43–58.

Kapelner, Adam, and Abba Krieger. 2014. “Matching on-the-Fly: Sequential Allocation with Higher Power and Efficiency.” Biometrics 70 (2): 378–88.

Morgan, Kari Lock, and Donald B Rubin. 2012. “Rerandomization to Improve Covariate Balance in Experiments.”

Pocock, Stuart J, and Richard Simon. 1975. “Sequential Treatment Assignment with Balancing for Prognostic Factors in the Controlled Clinical Trial.” Biometrics, 103–15.