19.3 Rerandomization

  • Since randomization only balances baseline covariates on average, imbalance in variables due to random chance can still happen.

  • In case that you have a “bad” randomization (i.e., imbalance for important baseline covariates), (Morgan and Rubin 2012) introduce the idea of rerandomization.

  • Rerandomization is checking balance during the randomization process (before the experiment), to eliminate bad allocation (i.e., those with unacceptable balance).

  • The greater the number of variables, the greater the likelihood that at least one covariate would be imbalanced across treatment groups.

    • Example: For 10 covariates, the probability of a significant difference at \(\alpha = .05\) for at least one covariate is \(1 - (1-.05)^{10} = 0.4 = 40\%\)
  • Rerandomization increase treatment effect estimate precision if the covariates are correlated with the outcome.

    • Improvement in precision for treatment effect estimate depends on (1) improvement in covariate balance and (2) correlation between covariates and the outcome.
  • You also need to take into account rerandomization into your analysis when making inference.

  • Rerandomization is equivalent to increasing our sample size.

  • Alternatives include

Figure from USC Schaeffer Center

Rerandomization Criterion

  • Acceptable randomization is based on a function of covariate matrix \(\mathbf{X}\) and vector of treatment assignments \(\mathbf{W}\)

\[ W_i = \begin{cases} 1 \text{ if treated} \\ 0 \text{ if control} \end{cases} \]

  • Mahalanobis Distance, \(M\), can be used as criteria for acceptable balance

Let \(M\) be the multivariate distance between groups means

\[ \begin{aligned} M &= (\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C)' cov(\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C)^{-1} (\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C) \\ &= (\frac{1}{n_T}+ \frac{1}{n_C})^{-1}(\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C)' cov(\mathbf{X})^{-1}(\bar{\mathbf{X}}_T - \bar{\mathbf{X}}_C) \end{aligned} \]

With large sample size and “pure” randomization \(M \sim \chi^2_k\) where \(k\) is the number of covariates to be balanced

Then let \(p_a\) be the probability of accepting a randomization. Choosing appropriate \(p_a\) is a tradeoff between balance and time.

Then the rule of thumb is re-randomize when \(M > a\)

References

Greevy, Robert, Bo Lu, Jeffrey H Silber, and Paul Rosenbaum. 2004. “Optimal Multivariate Matching Before Randomization.” Biostatistics 5 (2): 263–75.
Johansson, Per, and Mårten Schultzberg. 2022. “Rerandomization: A Complement or Substitute for Stratification in Randomized Experiments?” Journal of Statistical Planning and Inference 218: 43–58.
Kapelner, Adam, and Abba Krieger. 2014. “Matching on-the-Fly: Sequential Allocation with Higher Power and Efficiency.” Biometrics 70 (2): 378–88.
Morgan, Kari Lock, and Donald B Rubin. 2012. “Rerandomization to Improve Covariate Balance in Experiments.”
Pocock, Stuart J, and Richard Simon. 1975. “Sequential Treatment Assignment with Balancing for Prognostic Factors in the Controlled Clinical Trial.” Biometrics, 103–15.