1.5 vroom
vroom包 实现读取矩形数据到 R 中,如 comma separated(csv),tab separated(tsv), fixed width files(fwf)。该包的功能类似readr::read_csv()
,data.table::fread()
和read.csv()
,但是对于许多数据集来说,vroom::vroom()
读取速度会快得多。
1.5.1 安装
# 从cran安装
install.packages("vroom")
# install.packages("devtools")
::install_dev("vroom") devtools
1.5.2 用法
- 读取文件
library(vroom)
<- vroom_example("mtcars.csv")
file
file#> [1] "C:/R/R-4.1.0/library/vroom/extdata/mtcars.csv"
vroom(file)
#> Rows: 32
#> Columns: 12
#> Delimiter: ","
#> chr [ 1]: model
#> dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#>
#> Use `spec()` to retrieve the guessed column specification
#> Pass a specification to the `col_types` argument to quiet this message
#> # A tibble: 32 x 12
#> model mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> # ... with 26 more rows
vroom(file, delim = ",")
#> Rows: 32
#> Columns: 12
#> Delimiter: ","
#> chr [ 1]: model
#> dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#>
#> Use `spec()` to retrieve the guessed column specification
#> Pass a specification to the `col_types` argument to quiet this message
#> # A tibble: 32 x 12
#> model mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> # ... with 26 more rows
- 读取多文件
即vroom::vroom()
具备迭代效果,具体情况如下:
<- tibble::rownames_to_column(mtcars, "model")
mt ::iwalk(
purrrsplit(mt, mt$cyl),
~ vroom_write(.x, glue::glue("mtcars_{.y}.csv"), "\t")
)
<- fs::dir_ls(glob = "mtcars*csv")
files
files
# read_csv
::map_dfr(files,readr::read_delim,delim="\t")
purrr
# vroom same above
vroom(files)
- 读取压缩文件
vroom支持zip,gz,bz2,xz等压缩文件,只需要将压缩文件名称传递给vroom即可。
<- vroom_example("mtcars.csv.gz")
file
vroom(file)
- 读取网络文件
#file <- "https://raw.githubusercontent.com/r-lib/vroom/master/inst/extdata/mtcars.csv"
<- "http://www.zhongyufei.com/Rbook/data/data-table-groupingsets.csv"
file vroom(file)
- 选择列读取
room提供了与dplyr::select()
相同的列选择和重命名接口
<- vroom_example("mtcars.csv.gz")
file
vroom(file, col_select = c(model, cyl, gear))
#> # A tibble: 32 x 3
#> model cyl gear
#> <chr> <dbl> <dbl>
#> 1 Mazda RX4 6 4
#> 2 Mazda RX4 Wag 6 4
#> 3 Datsun 710 4 4
#> 4 Hornet 4 Drive 6 3
#> 5 Hornet Sportabout 8 3
#> 6 Valiant 6 3
#> # ... with 26 more rows
# vroom(file, col_select = c(1, 3, 11))
# vroom(file, col_select = list(car = model, everything()))