3.2 其它数据格式

来自其它格式的数据形式,如 JSON、XML、YAML 需要转化清理成 R 中数据框的形式 data.frame

  1. Data Rectangling with jq
  2. Mongolite User Manual introduction to using MongoDB with the mongolite client in R

jsonlite 读取 *.json 格式的文件,jsonlite::write_json 函数将 R对象保存为 JSON 文件,jsonlite::fromJSON 将 json 字符串或文件转化为 R 对象,jsonlite::toJSON 函数正好与之相反

library(jsonlite)
# 从 json 格式的文件导入
# jsonlite::read_json(path = "path/to/filename.json")
# A JSON array of primitives
json <- '["Mario", "Peach", null, "Bowser"]'

# 简化为原子向量atomic vector
fromJSON(json)
## [1] "Mario"  "Peach"  NA       "Bowser"
# 默认返回一个列表
fromJSON(json, simplifyVector = FALSE)
## [[1]]
## [1] "Mario"
## 
## [[2]]
## [1] "Peach"
## 
## [[3]]
## NULL
## 
## [[4]]
## [1] "Bowser"

yaml 包读取 *.yml 格式文件,返回一个列表,yaml::write_yaml 函数将 R 对象写入 yaml 格式

library(yaml)
yaml::read_yaml(file = '_bookdown.yml')
## $book_filename
## [1] "notesdown"
## 
## $delete_merged_file
## [1] TRUE
## 
## $language
## $language$label
## $language$label$fig
## [1] "图 "
## 
## $language$label$tab
## [1] "表 "
## 
## 
## $language$ui
## $language$ui$edit
## [1] "编辑"
## 
## $language$ui$chapter_name
## [1] "第 " " 章"
## 
## $language$ui$appendix_name
## [1] "附录 "
## 
## 
## 
## $new_session
## [1] TRUE
## 
## $before_chapter_script
## [1] "_common.R"
## 
## $rmd_files
##  [1] "index.Rmd"                    "preface.Rmd"                 
##  [3] "data-wrangling.Rmd"           "data-structure.Rmd"          
##  [5] "data-transportation.Rmd"      "string-operations.Rmd"       
##  [7] "regular-expressions.Rmd"      "data-manipulation.Rmd"       
##  [9] "advanced-manipulation.Rmd"    "parallel-manipulation.Rmd"   
## [11] "other-manipulation.Rmd"       "statistical-graphics.Rmd"    
## [13] "graphics-foundations.Rmd"     "visualization-colors.Rmd"    
## [15] "visualization-gallery.Rmd"    "interactive-web-graphics.Rmd"
## [17] "statistical-computation.Rmd"  "numerical-optimization.Rmd"  
## [19] "differential-equations.Rmd"   "appendix.Rmd"                
## [21] "references.Rmd"
表 3.2: 导入来自其它数据分析软件产生的数据集
统计软件 R函数 R包
ERSI ArcGIS read.shapefile shapefiles
Matlab readMat R.matlab
minitab read.mtp foreign
SAS (permanent data) read.ssd foreign
SAS (XPORT format) read.xport foreign
SPSS read.spss foreign
Stata read.dta foreign
Systat read.systat foreign
Octave read.octave foreign
表 3.3: 导入来自其它格式的数据集
文件格式 R函数 R包
列联表数据 read.ftable stats
二进制数据 readBin base
字符串数据 readChar base
剪贴板数据 readClipboard utils

read.dcf 函数读取 Debian 控制格式文件,这种类型的文件以人眼可读的形式在存储数据,如 R 包的 DESCRIPTION 文件或者包含所有 CRAN 上 R 包描述的文件 https://cran.r-project.org/src/contrib/PACKAGES

x <- read.dcf(file = system.file("DESCRIPTION", package = "splines"),
              fields = c("Package", "Version", "Title"))
x
##      Package   Version Title                                    
## [1,] "splines" "4.2.3" "Regression Spline Functions and Classes"

最后要提及拥有瑞士军刀之称的 rio 包,它集合了当前 R 可以读取的所有统计分析软件导出的数据。