28.4 Naming, renaming and reclassifying

Datasets

Merging, subsetting, grouping, aggregating all create new versions of a dataset. Transforming variables and creating new ones do too. It is helpful to name and store these new datasets if they are going to be used more than once. This can lead to a large number of stored objects (Chapter 16 had over thirty), but provides structure for analyses—assuming supportive naming conventions are used.

Variables and categories

Variables may have abbreviated names when informative ones would be more helpful, for instance when software automatically uses names on plots and reports. Ideally, derived variables should be given names that reflect what they represent, so that it is easier to recognise what they measure. Software may classify numeric variables incorrectly if text is found amongst their values (perhaps notes or special codes like currency symbols). Exchanging data files between countries using decimal points and those using decimal commas also requires wrangling. It is common to have to reclassify variables as numeric after text has been replaced or removed.

Levels of categorical variables may be provided as numbers. It is better they be given meaningful names (which also ensures that the variables are not classified as numeric).