1.5 vroom

vroom包 实现读取矩形数据到 R 中,如 comma separated(csv),tab separated(tsv), fixed width files(fwf)。该包的功能类似readr::read_csv(),data.table::fread()read.csv(),但是对于许多数据集来说,vroom::vroom()读取速度会快得多。

vroom项目地址

1.5.1 安装

# 从cran安装
install.packages("vroom")
# install.packages("devtools")
devtools::install_dev("vroom")

1.5.2 用法

  1. 读取文件
library(vroom)
file <- vroom_example("mtcars.csv")
file
#> [1] "C:/R/R-4.1.0/library/vroom/extdata/mtcars.csv"

vroom(file)
#> Rows: 32
#> Columns: 12
#> Delimiter: ","
#> chr [ 1]: model
#> dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> Use `spec()` to retrieve the guessed column specification
#> Pass a specification to the `col_types` argument to quiet this message
#> # A tibble: 32 x 12
#>   model          mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4     21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2 Mazda RX4 W~  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3 Datsun 710    22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4 Hornet 4 Dr~  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5 Hornet Spor~  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
#> 6 Valiant       18.1     6   225   105  2.76  3.46  20.2     1     0     3     1
#> # ... with 26 more rows
vroom(file, delim = ",")
#> Rows: 32
#> Columns: 12
#> Delimiter: ","
#> chr [ 1]: model
#> dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> Use `spec()` to retrieve the guessed column specification
#> Pass a specification to the `col_types` argument to quiet this message
#> # A tibble: 32 x 12
#>   model          mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4     21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2 Mazda RX4 W~  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3 Datsun 710    22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4 Hornet 4 Dr~  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5 Hornet Spor~  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
#> 6 Valiant       18.1     6   225   105  2.76  3.46  20.2     1     0     3     1
#> # ... with 26 more rows
  1. 读取多文件

vroom::vroom()具备迭代效果,具体情况如下:

mt <- tibble::rownames_to_column(mtcars, "model")
purrr::iwalk(
  split(mt, mt$cyl),
  ~ vroom_write(.x, glue::glue("mtcars_{.y}.csv"), "\t")
)

files <- fs::dir_ls(glob = "mtcars*csv")
files

# read_csv

purrr::map_dfr(files,readr::read_delim,delim="\t")

# vroom same above
vroom(files) 
  1. 读取压缩文件

vroom支持zip,gz,bz2,xz等压缩文件,只需要将压缩文件名称传递给vroom即可。

file <- vroom_example("mtcars.csv.gz")

vroom(file)
  1. 读取网络文件
#file <- "https://raw.githubusercontent.com/r-lib/vroom/master/inst/extdata/mtcars.csv"
file <- "http://www.zhongyufei.com/Rbook/data/data-table-groupingsets.csv"
vroom(file)
  1. 选择列读取

room提供了与dplyr::select()相同的列选择和重命名接口

file <- vroom_example("mtcars.csv.gz")

vroom(file, col_select = c(model, cyl, gear))
#> # A tibble: 32 x 3
#>   model               cyl  gear
#>   <chr>             <dbl> <dbl>
#> 1 Mazda RX4             6     4
#> 2 Mazda RX4 Wag         6     4
#> 3 Datsun 710            4     4
#> 4 Hornet 4 Drive        6     3
#> 5 Hornet Sportabout     8     3
#> 6 Valiant               6     3
#> # ... with 26 more rows

# vroom(file, col_select = c(1, 3, 11))

# vroom(file, col_select = list(car = model, everything()))