Applications of Machine Learning in Imputation
2019
Chapter 1 Preface
1.1 Acknowledgements
I would like to acknowledge the following people in helping produce this report:
- Emily Tew and Gareth Clews for their guidance and support, in getting XGBoost up and running.
- Fern Leather, for getting CANCEIS to work. Really grateful for taking the time to run through the specification files with me.
- Luke Lorenzi and Vahe Nafilyan, for helping me put the pieces together, and helping me figure out how we can progress this work in the context of survey data.
1.2 Introduction
Editing and imputation are both methods of data processing. Editing refers to the detection and correction of errors in the data, whilst imputation is a method of correcting errors in a dataset (Geron 2017). This document presents findings from work carried out on the use of machine learning in imputation. The chapters address the following questions:
- What is imputation?
- What is machine learning?
- Why use machine learning?
- How XGBoost works?
- Methods used for the investigation
- Results of the investigation
- Conclusions and future direction
References
Geron, Aurelien. 2017. Hands-on Machine Learning with Scikit-Learn and Tensorflow. 1st ed. Sebastopol, California: O’Reilly Media. https://oreilly.com/catalog/errata.csp?isbn=9781491962299.