Chapter 9 Resources

9.1 R scripts

All R scripts pertaining to this investigation are saved in the folder “R”. If you are intending to run any of the code, please load all the packages and functions first, which can be found in the script “WF1_package_load.R” and “WF2_function_load.R”.

The program “WF3_data_prep.R” cleans and edits the data for the investigation, whilst the scripts “WF4_missingness.R” and “WF5_data_load.R” simulate missigness and produce descriptive statistics of the data sets respectively. Scripts with the prefix “WFI” refer to programs that carry out the imputation for each variable.

9.2 Data

The folder named “data” includes:

  • full raw Census Teaching File located in “data/source”
  • the clean Census Teaching File located in “data/core”
  • the label encoded training and test data located in “data/label”
  • the label encoded test data with missingness located in “data/label/missingness”
  • the one hot encoded training and test data located in “data/ohe”
  • the one hot encoded data with missingness located in “data/ohe/missingness”
  • the predicted values from XGBoost located in “data/predicted/XGBoost”
  • the pre-imputed and post-imputed data for CANCEIS located in “data/CANCEIS”
  • the pre-imputed and post-imputed data for Mixed Methods approach located in “data/CANCEISXG”

9.3 XGBoost

The models produced for each imputable variable can be found in the following folder (located in the main directory):

  • models/XGBoost

9.4 Donor imputation

The CANCEIS specifications used for the two rounds of donor based imputation can be found in the following folders (located in the main directory):

  • CANCEIS
  • MixedMethods