6 How to do simple re-coding
At work, sometimes we may need to recode a categorical variable to another one according to some mapping rules. This is the so-called “re-coding”. Below is an R function that I write for recording.
simple_recoding <- function(v, from, to, mapping_rule_data = NULL)
{L <- length(v)
N <- length(to)
if(is.null(mapping_rule_data) == TRUE)
{mapping_rule <- matrix(rep(1:N, each = 2), N, 2, byrow = TRUE)}
else
{mapping_rule <- matrix(mapping_rule_data, N, 2, byrow = TRUE)}
the_result <- rep("", L)
for(i in 1:L)
for(j in 1:N)
{a <- mapping_rule[j, 1]
b <- mapping_rule[j, 2]
if(v[i] %in% from[a : b]) the_result[i] <- to[j]
}
return(the_result)
}
Let me use two examples to explain.
Example 1:
(x <- sample(letters, 30, replace = TRUE))
## [1] "d" "d" "o" "e" "c" "q" "h" "a" "k" "m" "l" "m" "m" "l" "p" "a" "h"
## [18] "j" "r" "p" "m" "a" "l" "b" "p" "d" "b" "s" "z" "j"
We want to re-code x to y, where x has lowercase letters and y has the corresponding uppercase letters. In this case.
from = letters
and
to = LETTERS
We can use the default
mapping_rule_data = NULL
since this is a one-to-one mapping.
(y <- simple_recoding(x, from = letters, to = LETTERS))
## [1] "D" "D" "O" "E" "C" "Q" "H" "A" "K" "M" "L" "M" "M" "L" "P" "A" "H"
## [18] "J" "R" "P" "M" "A" "L" "B" "P" "D" "B" "S" "Z" "J"
Example 2:
(u <- sample(letters, 30, replace = TRUE))
## [1] "a" "h" "n" "e" "k" "v" "t" "s" "c" "s" "i" "b" "y" "w" "z" "q" "a"
## [18] "m" "g" "i" "u" "j" "z" "v" "b" "h" "t" "k" "l" "d"
We want to re-code u to w, where the mapping rule is as follows. \[ \begin{array}{ccc} \left\{\hbox{a, b, c, d, e}\right\} &\longrightarrow & \left\{\hbox{A}\right\}\\ \left\{\hbox{f, g, h, i, j}\right\} &\longrightarrow & \left\{\hbox{B}\right\}\\ \left\{\hbox{k, l, m, n, o}\right\} &\longrightarrow & \left\{\hbox{C}\right\}\\ \left\{\hbox{p, q, r, s, t}\right\} &\longrightarrow& \left\{\hbox{D}\right\}\\ \left\{\hbox{u, v, w, x, y}\right\} &\longrightarrow& \left\{\hbox{E}\right\}\\ \left\{\hbox{z}\right\} &\longrightarrow& \left\{\hbox{Z}\right\} \end{array} \] In this case,
from = letters
and
to = c(LETTERS[1:5], "Z")
But note that
mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )
because letters[1:5] are mapped to “A”, letters[6:10] are mapped to “B”, and so on.
(simple_recoding(u, from = letters, to = c(LETTERS[1:5], "Z"),
mapping_rule_data = c(1, 5, 6, 10, 11, 15, 16, 20, 21, 25, 26, 26 )))
## [1] "A" "B" "C" "A" "C" "E" "D" "D" "A" "D" "B" "A" "E" "E" "Z" "D" "A"
## [18] "C" "B" "B" "E" "B" "Z" "E" "A" "B" "D" "C" "C" "A"
Exercise:
fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))
Create a new variable called “RGB” following the mapping rules \[ \begin{array}{ccc} \left\{\hbox{red, orange, yellow}\right\} &\longrightarrow& \left\{\hbox{R}\right\}\\ \left\{\hbox{green}\right\} &\longrightarrow& \left\{\hbox{G}\right\}\\ \left\{\hbox{blue}\right\} &\longrightarrow& \left\{\hbox{B}\right\} \end{array} \]
Answer to the Exercise:
rm(list = ls())
# load package
library(dplyr)
source("simple_recoding.R")
# create a fake data set
fk_data <- data.frame(my_colors = c("red", "orange", "yellow", "green", "blue"))
fk_data_1 <-
fk_data %>%
mutate(RGB = simple_recoding(my_colors,
from = my_colors,
to = c("R", "G", "B"),
mapping_rule_data = c(1, 3, 4, 4, 5, 5)))