28.1 Data wrangling: what and why

Data wrangling covers all the work put into cleaning, organising, and restructuring data. Few datasets arrive perfectly formed in an ideal shape—other than datasets found in textbooks. They have to be checked (correcting or excluding cases) and reorganised (reformatting, transforming, and restructuring them). A detailed example is described in Amaliah et al. (2022). In-depth studies require a range of analyses, involving grouping, subsetting, and aggregating datasets.

Data wrangling takes time and effort. Every dataset with problems has different problems. Sometimes they are immediately obvious, sometimes they only become apparent during analysis. There are many, many ways of restructuring datasets. Some will be better than others. It will depend on the particular dataset and the reasons for analysing it.