# 6 How to do simple re-coding

At work, sometimes we may need to recode a categorical variable to another one according to some mapping rules. This is the so-called “re-coding”. Below is an R function that I write for recording.

simple_recoding <- function(v, from, to, mapping_rule_data = NULL)
{L <- length(v)
N <- length(to)

if(is.null(mapping_rule_data) == TRUE)
{mapping_rule <- matrix(rep(1:N, each = 2), N, 2, byrow = TRUE)}
else
{mapping_rule <- matrix(mapping_rule_data, N, 2, byrow = TRUE)}

the_result <- rep("", L)

for(i in 1:L)
for(j in 1:N)
{a <- mapping_rule[j, 1]
b <- mapping_rule[j, 2]
if(v[i] %in% from[a : b]) the_result[i] <- to[j]
}
return(the_result)
}

Let me use two examples to explain.

Example 1:

(x <- sample(letters, 30, replace = TRUE))
##   "d" "d" "o" "e" "c" "q" "h" "a" "k" "m" "l" "m" "m" "l" "p" "a" "h"
##  "j" "r" "p" "m" "a" "l" "b" "p" "d" "b" "s" "z" "j"

We want to re-code x to y, where x has lowercase letters and y has the corresponding uppercase letters. In this case.

from = letters

and

to = LETTERS

We can use the default

mapping_rule_data = NULL

since this is a one-to-one mapping.

(y <- simple_recoding(x, from = letters, to = LETTERS))
##   "D" "D" "O" "E" "C" "Q" "H" "A" "K" "M" "L" "M" "M" "L" "P" "A" "H"
##  "J" "R" "P" "M" "A" "L" "B" "P" "D" "B" "S" "Z" "J"

Example 2:

(u <- sample(letters, 30, replace = TRUE))
##   "a" "h" "n" "e" "k" "v" "t" "s" "c" "s" "i" "b" "y" "w" "z" "q" "a"
##  "m" "g" "i" "u" "j" "z" "v" "b" "h" "t" "k" "l" "d"

We want to re-code u to w, where the mapping rule is as follows. $\begin{array}{ccc} \left\{\hbox{a, b, c, d, e}\right\} &\longrightarrow & \left\{\hbox{A}\right\}\\ \left\{\hbox{f, g, h, i, j}\right\} &\longrightarrow & \left\{\hbox{B}\right\}\\ \left\{\hbox{k, l, m, n, o}\right\} &\longrightarrow & \left\{\hbox{C}\right\}\\ \left\{\hbox{p, q, r, s, t}\right\} &\longrightarrow& \left\{\hbox{D}\right\}\\ \left\{\hbox{u, v, w, x, y}\right\} &\longrightarrow& \left\{\hbox{E}\right\}\\ \left\{\hbox{z}\right\} &\longrightarrow& \left\{\hbox{Z}\right\} \end{array}$ In this case,

from = letters

and

to = c(LETTERS[1:5], "Z")

But note that

mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )

because letters[1:5] are mapped to “A”, letters[6:10] are mapped to “B”, and so on.

(simple_recoding(u, from = letters, to = c(LETTERS[1:5], "Z"),
mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )))
##   "A" "B" "C" "A" "C" "E" "D" "D" "A" "D" "B" "A" "E" "E" "Z" "D" "A"
##  "C" "B" "B" "E" "B" "Z" "E" "A" "B" "D" "C" "C" "A"

Exercise:

fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))

Create a new variable called “RGB” following the mapping rules $\begin{array}{ccc} \left\{\hbox{red, orange, yellow}\right\} &\longrightarrow& \left\{\hbox{R}\right\}\\ \left\{\hbox{green}\right\} &\longrightarrow& \left\{\hbox{G}\right\}\\ \left\{\hbox{blue}\right\} &\longrightarrow& \left\{\hbox{B}\right\} \end{array}$

rm(list = ls())

mapping_rule_data = c(1, 3, 4, 4, 5, 5)))