## 6.6 Handling missing values

• 显式缺失(Explicitly missing): 在数据中用 NA 标识

• 隐式缺失(Implicitly missing): 未出现在数据中的值

R for Data Science 中对这两种缺失的概括：
> An explicit missing value is the presence of an absence; an implicit missing value is the absence of a presence.

stocks <- tibble(
year   = c(2015, 2015, 2015, 2015, 2016, 2016, 2016),
qtr    = c(   1,    2,    3,    4,    2,    3,    4),
return = c(1.88, 0.59, 0.35,   NA, 0.92, 0.17, 2.66)
)
stocks
#> # A tibble: 7 x 3
#>    year   qtr return
#>   <dbl> <dbl>  <dbl>
#> 1  2015     1   1.88
#> 2  2015     2   0.59
#> 3  2015     3   0.35
#> 4  2015     4  NA
#> 5  2016     2   0.92
#> 6  2016     3   0.17
#> # ... with 1 more row

stocks %>%
pivot_wider(names_from = year, values_from = return)
#> pivot_wider: reorganized (year, return) into (2015, 2016) [was 7x3, now 4x3]
#> # A tibble: 4 x 3
#>     qtr 2015 2016
#>   <dbl>  <dbl>  <dbl>
#> 1     1   1.88  NA
#> 2     2   0.59   0.92
#> 3     3   0.35   0.17
#> 4     4  NA      2.66

stocks %>%
pivot_wider(names_from = year, values_from = return) %>%
pivot_longer(-qtr, names_to = "year", values_to = "return")
#> pivot_wider: reorganized (year, return) into (2015, 2016) [was 7x3, now 4x3]
#> pivot_longer: reorganized (2015, 2016) into (year, return) [was 4x3, now 8x3]
#> # A tibble: 8 x 3
#>     qtr year  return
#>   <dbl> <chr>  <dbl>
#> 1     1 2015    1.88
#> 2     1 2016   NA
#> 3     2 2015    0.59
#> 4     2 2016    0.92
#> 5     3 2015    0.35
#> 6     3 2016    0.17
#> # ... with 2 more rows

## 现在输出数据框比原来少一行
stocks %>%
pivot_wider(names_from = year, values_from = return) %>%
pivot_longer(-qtr, names_to = "year", values_to = "return",
values_drop_na = TRUE)
#> pivot_wider: reorganized (year, return) into (2015, 2016) [was 7x3, now 4x3]
#> pivot_longer: reorganized (2015, 2016) into (year, return) [was 4x3, now 6x3]
#> # A tibble: 6 x 3
#>     qtr year  return
#>   <dbl> <chr>  <dbl>
#> 1     1 2015    1.88
#> 2     2 2015    0.59
#> 3     2 2016    0.92
#> 4     3 2015    0.35
#> 5     3 2016    0.17
#> 6     4 2016    2.66

fill() 专门用来填充缺失值,它接受一些需要填充缺失值的列，并用最近的值调换 NA.direction 参数控制用填充的方向：direction = “up" 将由下往上填充，NA 将被替换为它下面那一列的值；direction = "donw" 反之

treatment <- tribble(
~ person,           ~ treatment, ~response,
"Derrick Whitmore", 1,           7,
NA,                 2,           10,
NA,                 3,           9,
"Katherine Burke",  1,           4
)
treatment %>%
fill(person, .direction = "up")
#> fill: changed 2 values (50%) of 'person' (2 fewer NA)
#> # A tibble: 4 x 3
#>   person           treatment response
#>   <chr>                <dbl>    <dbl>
#> 1 Derrick Whitmore         1        7
#> 2 Katherine Burke          2       10
#> 3 Katherine Burke          3        9
#> 4 Katherine Burke          1        4
treatment %>%
fill(person, .direction = "down")
#> fill: changed 2 values (50%) of 'person' (2 fewer NA)
#> # A tibble: 4 x 3
#>   person           treatment response
#>   <chr>                <dbl>    <dbl>
#> 1 Derrick Whitmore         1        7
#> 2 Derrick Whitmore         2       10
#> 3 Derrick Whitmore         3        9
#> 4 Katherine Burke          1        4

More useful methods dealing with missing values (in tidyr and other packages) are discussed in 17