# 第 10 章 因子型变量

## 10.1 什么是因子

• 存储类别的数据类型
• 离散变量
• 因子的层级是有限的，只能取因子层级中的值或缺失(NA)

## 10.2 创建因子

library(tidyverse)
income <- c("low", "high", "medium", "medium", "low", "high",  "high")
factor(income)
## [1] low    high   medium medium low    high   high
## Levels: high low medium

factor(income, levels = c("low", "high", "medium") )
## [1] low    high   medium medium low    high   high
## Levels: low high medium

factor(income, levels = c("low", "high") )
## [1] low  high <NA> <NA> low  high high
## Levels: low high

library(forcats)

## 10.3 调整因子顺序

x <- factor(income)
x
## [1] low    high   medium medium low    high   high
## Levels: high low medium

x %>% fct_relevel(levels = c("high", "medium", "low"))
## [1] low    high   medium medium low    high   high
## Levels: high medium low

x %>% fct_relevel(levels = c("medium"))
## [1] low    high   medium medium low    high   high
## Levels: medium high low

x %>% fct_relevel("medium", after = Inf)
## [1] low    high   medium medium low    high   high
## Levels: high low medium

x %>% fct_inorder()
## [1] low    high   medium medium low    high   high
## Levels: low high medium

x %>% fct_reorder(c(1:7), .fun = median)  
## [1] low    high   medium medium low    high   high
## Levels: low medium high

## 10.4 应用

d <- tibble(
x = c("a","a", "b", "b", "c", "c"),
y = c(2, 2, 1, 5,  0, 3)

)
d
## # A tibble: 6 x 2
##   x         y
##   <chr> <dbl>
## 1 a         2
## 2 a         2
## 3 b         1
## 4 b         5
## 5 c         0
## 6 c         3

d %>%
ggplot(aes(x = x, y = y)) +
geom_point()

### 10.4.1 fct_reorder()

fct_reorder()可以让x的顺序按照x中每个分类变量对应y值的中位数升序排序，具体为

• a对应的y值c(2, 2) 中位数是median(c(2, 2)) = 2
• b对应的y值c(1, 5) 中位数是median(c(1, 5)) = 3
• c对应的y值c(0, 3) 中位数是median(c(0, 3)) = 1.5

d %>%
ggplot(aes(x = fct_reorder(x, y, .fun = median), y = y)) +
geom_point()

d %>%
ggplot(aes(x = fct_reorder(x, y, .fun = median, .desc = TRUE), y = y)) +
geom_point()

d %>%
mutate(x = fct_reorder(x, y, .fun = median, .desc = TRUE)) %>%
ggplot(aes(x = x, y = y)) +
geom_point()

d %>%
mutate(x = fct_reorder(x, y, .fun = min, .desc = TRUE)) %>%
ggplot(aes(x = x, y = y)) +
geom_point()

### 10.4.2 fct_rev()

d %>%
mutate(x = fct_rev(x)) %>%
ggplot(aes(x = x, y = y)) +
geom_point()

### 10.4.3 fct_relevel()

d %>%
mutate(
x = fct_relevel(x, c("c", "a", "b"))
) %>%

ggplot(aes(x = x, y = y)) +
geom_point()