Following our recent literature review, Construction and use of body weight measures from administrative data in a large national health system: A system review (In Review), in which 492 published documents were examined and 39 subsequently analyzed, we provide a more in-depth analysis of 33(133) weight cleaning algorithms (7(1,5,10,18,28,31,32) of 39 did not have an appropriate algorithm for inclusion). We have then chosen 12 algorithms representing the diversity of methods from the literature, while eliminating redundancies:

  • Janney 2016(20)
  • Littman 2012(24)
  • Maciejewski 2016(25)
  • Breland 2017(7)
  • Maguen 2013(26)
  • Goodrich 2016(14)
  • Chan 2017(9)
  • Jackson 2015(19)
  • Buta 2018(8)
  • Kazerooni 2016(22)
  • Noel 2012(27)
  • Rosenberger 2011(30)

Weight samples come from the US Dept. of Veteran’s Affairs Corporate Data Warehouse and the algorithms to deal with these data are specific to the source. However, some could be used regardless of the source as long as the data structure is similar1.

NOTE: algorithms here written in R rely on functions from these libraries c(“dplyr”, “magrittr”, “data.table”, “lazyeval”)


  1. To be honest, most of these algorithms could be used for “cleaning” any continuous set data