13.5 Recoding variables

  • Either do it manually (see below) or…
  • …using the plyr package
    • mapvalues(): Recode a categorical vector
    • cut(): Recode a continuous variable into a categorical one


  • Always check wether recoding worked (very common error!)
    • table(variable1, variablevar2): Contingency table for the two variables
    • str() and summary(): Check whether variables in the data set have expected distributions and beware of missings!



13.5.1 Example: Recoding variables

# MANUEL CLASSIC WAY
swiss2 <- swiss # Make a copy of the data set
names(swiss) # Display variables
str(swiss)
summary(swiss)

swiss2$d.catholic <- NA # generate new variable in dataset
View(swiss2)
swiss2$d.catholic[swiss2$Catholic <= 50] <- 0 # replace values conditional on Catholic
swiss2$d.catholic[swiss2$Catholic > 50] <- 1  # replace values conditional on Catholic
table(swiss2$d.catholic, swiss2$Catholic) # check recoding
names(swiss2)   # show variable names
names(swiss2)[7] <- "dummy.catholic"

# PLYR: "NEW" WAY
# For recoding character variables simply refer to text with ""
library(plyr)

# mapvalues()
swiss2$Examination2 <- mapvalues(swiss2$Examination, from = c(3, 37), to = c(NA, NA))

# cut()
swiss2$Examination2 <- cut(swiss2$Examination2,
                     breaks=c(-Inf, 12, 22, Inf),
                     labels=c("low","medium","high")) # greater than or equal to


13.5.2 Exercise: Recoding variables

  1. Save the data set swiss in a new object called swiss2.
  2. Recode the variable Infant.Mortality in your new data set swiss2 so that values <= 18 are coded as 0, 18 < values <= 20 as 1, 20 < values <= 21 as 2 and 21 < values <= 27 as 3. Do this using both the classic way and the cut() function and name the respective variables inf.mort.cla and inf.mort.cut.
  3. Check if your coding worked and check the class of the two new variables/objects.


13.5.3 Solution: Recoding variables