6.8 Miscellaneous Functions

There are several remaining useful functions in tidyr that cannot be easily categorized.

6.8.1 chop() and unchop()

Chopping and unchopping preserve the width of a data frame, changing its length. chop() makes df shorter by converting rows within each group into list-columns. unchop() makes df longer by expanding list-columns so that each element of the list-column gets its own row in the output.

Note that we get one row of output for each unique combination of non-chopped variables:

chop() differs from nest() in section 6.3 in that it does not collpase columns into a tibble, but into a list:

unchop():

If there’s a size-0 element (like NULL or an empty data frame), that entire row will be dropped from the output. If you want to preserve all rows, use keep_empty = TRUE to replace size-0 elements with a single row of missing values.

ptype: Optionally, supply a data frame prototype for the output cols, overriding the default that will be guessed from the combination of individual value

6.8.2 uncount()

Performs the opposite operation to dplyr::count(), duplicating rows according to a weighting variable (or expression)

we can supply a string .id to create a new variable which gives a unique identifier for each created row:

uncount() can be helpful in convertnig frequency form data to case form data, e.g:

Other way that can achieve this transformation: rep():

6.8.3 Exercises

Exercise 6.6 在清理 who 数据集时,我们说iso2iso3是冗余的,证明这一点

如果 iso2iso3 是冗余的,则在数据集中对于变量组合 (country, year) 的每个值,都能唯一确定一个观测(因为 (country, year) 本身可以被用作键)。

另一个思路是 distinct() 函数,它将返回数据框中某些列出现的的全部不重复的水平组合(注意complete()是”制造出“全部可能的水平组合),和 unique() 类似,但速度更快: