6 How to do simple re-coding

At work, sometimes we may need to recode a categorical variable to another one according to some mapping rules. This is the so-called “re-coding”. Below is an R function that I write for recording.

simple_recoding <- function(v, from, to, mapping_rule_data = NULL)
{L <- length(v)
 N <- length(to)
 
 if(is.null(mapping_rule_data) == TRUE) 
  {mapping_rule <- matrix(rep(1:N, each = 2), N, 2, byrow = TRUE)}
 else
  {mapping_rule <- matrix(mapping_rule_data, N, 2, byrow = TRUE)}
 
 the_result <- rep("", L)
 
 for(i in 1:L)
   for(j in 1:N)
   {a <- mapping_rule[j, 1]
    b <- mapping_rule[j, 2]
    if(v[i] %in% from[a : b]) the_result[i] <- to[j]
   }
 return(the_result)
}

Let me use two examples to explain.

Example 1:

(x <- sample(letters, 30, replace = TRUE))

##  [1] "d" "d" "o" "e" "c" "q" "h" "a" "k" "m" "l" "m" "m" "l" "p" "a" "h"
## [18] "j" "r" "p" "m" "a" "l" "b" "p" "d" "b" "s" "z" "j"

We want to re-code x to y, where x has lowercase letters and y has the corresponding uppercase letters. In this case.

from = letters

and

to = LETTERS

We can use the default

mapping_rule_data = NULL

since this is a one-to-one mapping.

(y <- simple_recoding(x, from = letters, to = LETTERS))

##  [1] "D" "D" "O" "E" "C" "Q" "H" "A" "K" "M" "L" "M" "M" "L" "P" "A" "H"
## [18] "J" "R" "P" "M" "A" "L" "B" "P" "D" "B" "S" "Z" "J"

Example 2:

(u <- sample(letters, 30, replace = TRUE))

##  [1] "a" "h" "n" "e" "k" "v" "t" "s" "c" "s" "i" "b" "y" "w" "z" "q" "a"
## [18] "m" "g" "i" "u" "j" "z" "v" "b" "h" "t" "k" "l" "d"

We want to re-code u to w, where the mapping rule is as follows. $\begin{array}{ccc} \left\{\hbox{a, b, c, d, e}\right\} &\longrightarrow & \left\{\hbox{A}\right\}\\ \left\{\hbox{f, g, h, i, j}\right\} &\longrightarrow & \left\{\hbox{B}\right\}\\ \left\{\hbox{k, l, m, n, o}\right\} &\longrightarrow & \left\{\hbox{C}\right\}\\ \left\{\hbox{p, q, r, s, t}\right\} &\longrightarrow& \left\{\hbox{D}\right\}\\ \left\{\hbox{u, v, w, x, y}\right\} &\longrightarrow& \left\{\hbox{E}\right\}\\ \left\{\hbox{z}\right\} &\longrightarrow& \left\{\hbox{Z}\right\} \end{array}$ In this case,

from = letters

and

to = c(LETTERS[1:5], "Z")

But note that

mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )

because letters[1:5] are mapped to “A”, letters[6:10] are mapped to “B”, and so on.

(simple_recoding(u, from = letters, to = c(LETTERS[1:5], "Z"),
                 mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )))

##  [1] "A" "B" "C" "A" "C" "E" "D" "D" "A" "D" "B" "A" "E" "E" "Z" "D" "A"
## [18] "C" "B" "B" "E" "B" "Z" "E" "A" "B" "D" "C" "C" "A"

Exercise:

fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))

Create a new variable called “RGB” following the mapping rules $\begin{array}{ccc} \left\{\hbox{red, orange, yellow}\right\} &\longrightarrow& \left\{\hbox{R}\right\}\\ \left\{\hbox{green}\right\} &\longrightarrow& \left\{\hbox{G}\right\}\\ \left\{\hbox{blue}\right\} &\longrightarrow& \left\{\hbox{B}\right\} \end{array}$

Answer to the Exercise:

rm(list = ls())

# load package
library(dplyr)

source("simple_recoding.R")

# create a fake data set
fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))

fk_data_1 <- 
  fk_data %>% 
  mutate(RGB = simple_recoding(my_colors, 
                               from = my_colors, 
                               to = c("R", "G", "B"),
                               mapping_rule_data = c(1, 3, 4, 4, 5, 5)))