1.3 openxlsx

openxlsx 包是当我需要定制输出 Excel 表格或报表时常用R包。目前该包的版本4.2.3,通过使用 Rcpp 加速,包的读写速度在 Excel 的百万级下是可接受状态,包的相关函数功能完善且简易好用,并且正在积极开发中,相信它以后功能会越来越强大。

项目官方地址:https://ycphs.github.io/openxlsx/index.html

openxlsx 主要优势:

  • 不依赖 java 环境
  • 读写速度可接受
  • 可设置条件格式,与 Excel 中『开始』选项卡的条件格式功能接近
  • 可批量插入 ggplot2 图
  • 可插入公式
  • 可渲染大部分 Excel 格式,并且效率相比部分 python 包高效
  • 可添加页眉页脚以及其他格式,方便直接打印
  • 功能稳定可用并且在积极开发中

版本信息查看

packageVersion("openxlsx")
#> [1] '4.2.3'

关于 openxlsx 更加详细的阐述:https://mp.weixin.qq.com/s/ZD0dJb0y8fsWGI1dCPh2mQ

1.3.1 安装

从CRAN安装发行版

# 稳定版
install.packages("openxlsx", dependencies = TRUE, repos = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/")

从Github安装开发版

install.packages(c("Rcpp", "devtools"), dependencies = TRUE)
library(devtools)
install_github("ycphs/openxlsx")

1.3.2 基础功能

本文仅呈现基础功能部分,即读写 Excel 文件。其它功能,请查阅项目官方地址或微信公众号文章R包-openxlsx-学习笔记

1.3.2.1 读取 Excel

read.xlsx()函数读取 Excel 文件。

library(openxlsx) 
read.xlsx('./data/read-write/批量读写.xlsx')
#>   name num
#> 1    a   1
#> 2    a   2
#> 3    a   3
#> 4    b   4
#> 5    b   5
#> 6    b   6
#> 7    d   7
#> 8    d   8
#> 9    d   9

read.xlsx()参数如下:

library(openxlsx)
read.xlsx(
  xlsxFile,
  sheet = 1,
  startRow = 1,
  colNames = TRUE,
  rowNames = FALSE,
  detectDates = FALSE,
  skipEmptyRows = TRUE,
  skipEmptyCols = TRUE,
  rows = NULL,
  cols = NULL,
  check.names = FALSE,
  sep.names = ".",
  namedRegion = NULL,
  na.strings = "NA",
  fillMergedCells = FALSE
)

以上参数中需要注意:detecDates 参数,当你的 Excel 表格中带日期列时需要将参数设置为TRUE,不然将会把日期识别为数字读入。

fillMergedCells参数,当你读取的表格中存在合并单元格,将用值填充其他全部单元格,如下所示:

merge-cell

read.xlsx('./test.xlsx',detectDates = TRUE,fillMergedCells = TRUE)

读取后如下所示:

openxlsx-merge-xlsx

readWorkbook()也可以读取 Excel 表格数据,参数与read.xlsx基本一致。

xlsxFile <- system.file("extdata", "readTest.xlsx", package = "openxlsx")
df1 <- readWorkbook(xlsxFile = xlsxFile, sheet = 1)

1.3.2.2 写入Excel

数据清洗完后,或者是分组聚合完成,需要将结果从R导出到Excle,这时就利用write.xlsx()函数将结果数据集写入 Excel 中。

write.xlsx()函数写入

write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "columns")

带格式输出

hs <- createStyle(
  textDecoration = "BOLD", fontColour = "#FFFFFF", fontSize = 12,
  fontName = "Arial Narrow", fgFill = "#4F80BD"
)
## Not run: 
write.xlsx(iris,
  file = "writeXLSX3.xlsx",
  colNames = TRUE, borders = "rows", headerStyle = hs
)

#4F80BD格式属于十六进制颜色码。

1.3.3 定制格式输出

openxlsx包有两套输出函数,wrire.xlsx()是其中一种,还有如下一种。

输出过程共分为四步,第一步创建workbook,第二步添加sheet,第三步写入数据,第四步保存workbook。在输出的过程中可以通过addStyle()createStyle()conditionalFormatting添加格式或条件格式。

df <- data.frame(a=1:10,b=1:10,d=1:10)
wb <- createWorkbook(creator = 'zhongyf',title = 'test')
addWorksheet(wb,sheetName = 'test')
writeDataTable(wb,sheet = 'test',x = df)
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)

我们以上面四步输出的方式,查看包自带的例子。

  • createWorkbook()

  • addWorksheet()

  • writeDataTable()

  • saveWorkbook()

wb <- createWorkbook("Fred")

## Add 3 worksheets
addWorksheet(wb, "Sheet 1")
addWorksheet(wb, "Sheet 2", gridLines = FALSE)
addWorksheet(wb, "Sheet 3", tabColour = "red")
addWorksheet(wb, "Sheet 4", gridLines = FALSE, tabColour = "#4F81BD")

## Headers and Footers
addWorksheet(wb, "Sheet 5",
  header = c("ODD HEAD LEFT", "ODD HEAD CENTER", "ODD HEAD RIGHT"),
  footer = c("ODD FOOT RIGHT", "ODD FOOT CENTER", "ODD FOOT RIGHT"),
  evenHeader = c("EVEN HEAD LEFT", "EVEN HEAD CENTER", "EVEN HEAD RIGHT"),
  evenFooter = c("EVEN FOOT RIGHT", "EVEN FOOT CENTER", "EVEN FOOT RIGHT"),
  firstHeader = c("TOP", "OF FIRST", "PAGE"),
  firstFooter = c("BOTTOM", "OF FIRST", "PAGE")
)

addWorksheet(wb, "Sheet 6",
  header = c("&[Date]", "ALL HEAD CENTER 2", "&[Page] / &[Pages]"),
  footer = c("&[Path]&[File]", NA, "&[Tab]"),
  firstHeader = c(NA, "Center Header of First Page", NA),
  firstFooter = c(NA, "Center Footer of First Page", NA)
)

addWorksheet(wb, "Sheet 7",
  header = c("ALL HEAD LEFT 2", "ALL HEAD CENTER 2", "ALL HEAD RIGHT 2"),
  footer = c("ALL FOOT RIGHT 2", "ALL FOOT CENTER 2", "ALL FOOT RIGHT 2")
)

addWorksheet(wb, "Sheet 8",
  firstHeader = c("FIRST ONLY L", NA, "FIRST ONLY R"),
  firstFooter = c("FIRST ONLY L", NA, "FIRST ONLY R")
)

addWorksheet(wb,'dataframe')

## Need data on worksheet to see all headers and footers
writeData(wb, sheet = 5, 1:400)
writeData(wb, sheet = 6, 1:400)
writeData(wb, sheet = 7, 1:400)
writeData(wb, sheet = 8, 1:400)
writeDataTable(wb,sheet = 'dataframe',iris)
## Save workbook
## Not run: 
saveWorkbook(wb, "addWorksheetExample.xlsx", overwrite = TRUE)

1.3.4 函数参数

本小节主要讲createWorkbook()addWorksheet(),writeDataTable(),saveWorkbook()四个函数的参数以及用法。

  • createWorkbook
createWorkbook(
  creator = ifelse(.Platform$OS.type == "windows", Sys.getenv("USERNAME"),
    Sys.getenv("USER")),
  title = NULL,
  subject = NULL,
  category = NULL
)
wb <- createWorkbook(
  creator = "宇飞的世界",
  title = "标题",
  subject = "主题",
  category = "类别目录"
)

在win系统上右击 Excel 文件属性,点击详细信息:

win-excel-attribute

  • addWorksheet

addWorksheet()函数添加sheet。

addWorksheet(
  wb,
  sheetName,
  gridLines = TRUE,
  tabColour = NULL,
  zoom = 100,
  header = NULL,
  footer = NULL,
  evenHeader = NULL,
  evenFooter = NULL,
  firstHeader = NULL,
  firstFooter = NULL,
  visible = TRUE,
  paperSize = getOption("openxlsx.paperSize", default = 9),
  orientation = getOption("openxlsx.orientation", default = "portrait"),
  vdpi = getOption("openxlsx.vdpi", default = getOption("openxlsx.dpi", default = 300)),
  hdpi = getOption("openxlsx.hdpi", default = getOption("openxlsx.dpi", default = 300))
)

addWorksheet()参数解释如下:

gridLines参数:表格中是否有网格线,在Excle『视图』选项卡下面的网格线去除打勾的效果一致

tabColour参数:输出表格sheet标签颜色

zoom:发大缩小,默认是100,可选范围10-400

header:页眉 长度为3的字符向量,左、中、右三个位置,用Na可跳过一位置,以下页眉页脚相同。

footer: 页脚

evenHeader: 每页页眉

evenFooter: 每页页脚

firstHeader: 第一页页眉

firstFooter: 第一页页脚

visible:sheet是否隐藏,如果为否sheet将被隐藏

paperSize:页面大小,详见 ?pageSetup 

orientation:One of "portrait" or "landscape" 不清楚干嘛用

vdpi: 屏幕分辨率 默认值即可,不用调整

hdpi: 屏幕分辨率 默认值即可,不用调整
  • writeDataTable

writeDataTable()函数将data.frame写入 Excel 。

writeDataTable(
  wb,
  sheet,
  x,
  startCol = 1,
  startRow = 1,
  xy = NULL,
  colNames = TRUE,
  rowNames = FALSE,
  tableStyle = "TableStyleLight9",
  tableName = NULL,
  headerStyle = NULL,
  withFilter = TRUE,
  keepNA = FALSE,
  na.string = NULL,
  sep = ", ",
  stack = FALSE,
  firstColumn = FALSE,
  lastColumn = FALSE,
  bandedRows = TRUE,
  bandedCols = FALSE
)

writeDataTable()参数解释如下:

wb:createWorkbook()函数创建的workbook对象

sheet:将输入写入的sheet名,由addWorksheet()函数创建

x : a data.frame

startCol: 从哪一列开始写入,接受数字

startRow:从哪一行开始写入,接受数字

xy: 指定开始行以及开始列的向量,xy = c(2,3)即从2列3行位置开始写入

colNames: 如果TRUE,列名写入

rowNames: 如果TRUE,行名写入

tableStyle:一些表格样式,在中文版Excle『开始』选项卡下面的套用表格格式,可以使用预定义的表格样式

tableName:workbook中的表中,表名必须是唯一的,熟悉Power Pivot的人应该很清晰“表”概念

headerStyle:自定义列样式应用,用createStyle()函数创建

withFilter: 在第一行是否有筛选

keepNA:如果TRUE,NA值转化为#NA,如果否,变为空

na.string :如果参数不是NULL,如果keepNA是TRUE,NA将会在 Excel 中转为该参数指定的字符

sep:列表列的分隔符,默认为","

stack :详情请参照函数使用

firstColumn:logical. If TRUE, the first column is bold

lastColumn :logical. If TRUE, the last column is bold

bandedRows:logical. If TRUE, rows are colour banded 即行线,默认为TRUE

bandedCols:logical. If TRUE, the columns are colour banded 即列线 ,默认为TRUE
  • saveWorkbook
saveWorkbook(wb, file, overwrite = FALSE, returnValue = FALSE)

参数较为简单,wb即上文中的workbook对象,file即输出的文件名,overwrite即如果存在是否覆盖,returnValue如果设置为TRUE,返回TRUE代表保存成功

1.3.5 总结

openxlsx包功能较为强大,更多详细用法大家可自行探索,或关注我的语雀笔记,笔记会不定期持续更新。

R包openxlsx学习笔记:https://www.yuque.com/docs/share/7a768e6f-95e0-417c-a9b5-dfc8862dc6be?#

语雀个人主页:https://www.yuque.com/zyufei