13.6 Criteria for Choosing an Effective Approach
Choosing an appropriate imputation method depends on the following criteria:
Unbiased Parameter Estimates: The technique should ensure that key estimates, such as means, variances, and regression coefficients, are unbiased, particularly in the presence of MAR or MNAR data.
Adequate Power: The method should preserve statistical power, enabling robust hypothesis testing and model estimation. This ensures that important effects are not missed due to inflated type II error.
Accurate Standard Errors: Accurate estimation of standard errors is critical for reliable p-values and confidence intervals. Methods like single imputation often underestimate standard errors, leading to overconfident conclusions.
Preferred Methods: Multiple Imputation and Full Information Maximum Likelihood
Multiple Imputation (MI):
MI replaces missing values with multiple plausible values drawn from a predictive distribution. It generates multiple complete datasets, analyzes each dataset, and combines the results.
Pros: Handles uncertainty well, provides valid standard errors, and is robust under MAR.
Cons: Computationally intensive, sensitive to model mis-specification.
Full Information Maximum Likelihood (FIML):
FIML uses all available data to estimate parameters directly, avoiding the need to impute missing values explicitly.
Pros: Efficient, unbiased under MAR, and computationally elegant.
Cons: Requires correctly specified models and may be sensitive to MNAR data.
Methods to Avoid
- Single Imputation (e.g., Mean, Mode):
- Leads to biased estimates and underestimates variability.
- Listwise Deletion:
- Discards rows with missing data, reducing sample size and potentially introducing bias if the data is not MCAR.
Practical Considerations
- Computational efficiency and ease of implementation.
- Compatibility with downstream analysis methods.
- Alignment with the data’s missingness mechanism.