3.5 Convert numeric to binary

Sometimes you have a numeric variable that takes on values over a range (e.g., BMI, age, etc.) and you would like to create a binary (0/1, yes/no) categorical variable with levels corresponding to specific ranges. For example, let’s convert the variable Age (years) into “Elderly” corresponding to age at least 65 years. This can be left as a 0/1 variable, or converted to a factor.

# As a 0/1 variable
mydat$elderly <- as.numeric(mydat$Age >= 65)

# Examine new variable
table(mydat$elderly, useNA = "ifany")
## 
##   0   1 
## 375 155
# Check range of original variable at levels of new
tapply(mydat$Age, mydat$elderly, range)
## $`0`
## [1] 42 64
## 
## $`1`
## [1] 65 90
# As a factor
mydat$elderly_fac <- factor(mydat$elderly,
                            levels = 0:1,
                            labels = c("Age < 65y", "Age 65y+"))

# Examine new variable
table(mydat$elderly_fac, useNA = "ifany")
## 
## Age < 65y  Age 65y+ 
##       375       155
# Check range of original variable at levels of new
tapply(mydat$Age, mydat$elderly_fac, range)
## $`Age < 65y`
## [1] 42 64
## 
## $`Age 65y+`
## [1] 65 90

In tidyverse, you would use the following code. The code below uses a few new functions, summarize() and group_by().

# As a 0/1 variable
mydat_tibble <- mydat_tibble %>% 
  mutate(elderly = case_when(Age <  65 ~ 0,
                             Age >= 65 ~ 1))

# Examine new variable
mydat_tibble %>% 
  count(elderly)

# Check range of original variable at levels of new
mydat_tibble %>% 
  group_by(elderly) %>% 
  summarize(min = min(Age),
            max = max(Age))

# As a factor
mydat_tibble <- mydat_tibble %>% 
  mutate(elderly = case_when(Age <  65 ~ 0,
                             Age >= 65 ~ 1)) %>% 
  mutate(elderly_fac = factor(elderly,
                              levels = 0:1,
                              labels = c("Age < 65y", "Age 65y+")))

# Examine new variable
mydat_tibble %>% 
  count(elderly_fac)

# Check range of original variable at levels of new
mydat_tibble %>% 
  group_by(elderly_fac) %>% 
  summarize(min = min(Age),
            max = max(Age))