第 14 章 动态性文档

R Markdown 文档(???) 中的 Python 代码块是由 knitr 包 (???) 负责调度处理的,展示 Matplotlib 绘图的结果使用了 reticulate 包 (???) 提供的 Python 引擎而不是 knitr 自带的。

软件信息

编译书籍使用的 Python 3 模块有

pip3 list --format=columns

R 代码运行环境

sessionInfo()
#> R Under development (unstable) (2019-11-11 r77397)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 8.1 x64 (build 9600)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936 
#> [2] LC_CTYPE=Chinese (Simplified)_China.936   
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C                              
#> [5] LC_TIME=Chinese (Simplified)_China.936    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.0  magrittr_1.5    bookdown_0.15   tools_4.0.0    
#>  [5] htmltools_0.4.0 curl_4.2        yaml_2.2.0      Rcpp_1.0.3     
#>  [9] stringi_1.4.3   rmarkdown_1.17  knitr_1.26      stringr_1.4.0  
#> [13] xfun_0.11       digest_0.6.22   rlang_0.4.1     evaluate_0.14

knitr::opts_chunk 中设置 python.reticulate = TRUE 意味着所有的 Python 代码块共享一个 Python Session,而 python.reticulate = FALSE 意味着使用 knitr 提供的 Python 引擎,所有的 Python 代码块独立运行。

python.reticulate = TRUE 会使用 reticulate 提供的 Python 引擎,它支持 matplotlib 绘图,但是不支持图 caption,knitr 的 python 引擎是支持 caption 的

R 和 Python 之间的交互,Python 负责数据处理和建模, R 负责绘图,有些复杂的机器学习模型及其相关数据操作需要在 Python 中完成,数据集清理至数据框的形式后导入到 R 中,画各种静态或者动态图,这时候需要加载 reticulate 包,只是设置 python.reticulate = TRUE 还不够

R 调用 Python

pandas 读取数据,整理后由 reticulate 包传递给 R 环境中的 data.frame 对象,加载 ggplot2 绘图

import pandas as pd
iris2 = pd.read_csv('iris.csv')

如图 ?? 所示

library(reticulate)
library(ggplot2)
ggplot(py$iris2, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(aes(color = Species)) +
  scale_color_viridis_d()
library(ggplot2)
ggplot(faithfuld, aes(waiting, eruptions)) +
  geom_raster(aes(fill = density)) +
  scale_fill_continuous()

shiny

三剑客 Markdown & Pandoc’s Markdown & R Markdown Markdown for scientific writing

首先介绍 Markdown 在强调、标题、列表、断行、链接、图片、引用、代码块、LaTeX 公式等使用方式,然后在 Markdown 的基础上介绍 Pandoc’s Markdown 功能有加强的地方,R Markdown 在 Pandoc’s Markdown 的基础上介绍功能有加强的地方

14.1 Markdown

Markdown 基础语法见 RStudio IDE 自带的 Markdown 手册:RStudio 顶部菜单栏 -> Help -> Markdown Quick Reference,这里主要介绍一下Markdown 高级语法,特别是 Pandoc’s Markdown,其实是 Pandoc 提供了很多对 Markdown 的扩展支持,下面介绍一下被 Pandoc 加强后的 Markdown 表格、图片和公式的使用

14.1.1 列表

  • 有序的列表

    1. 第一条
    2. 第二条
  • 无序的列表

    • 第一条

    • 第二条

    • here is my first list item.

    • and my second.

  • 嵌套的列表

    1. 有序
    2. Item 2
    3. Item 3
      • Item 3a
      • Item 3b
    • 无序
    • Item 2
      • Item 2a
      • Item 2b

定义型列表中包含代码

Term 1

Definition 1

Term 2 with inline markup

Definition 2

{ some code, part of Definition 2 }

Third paragraph of definition 2.

定义类型的列表,紧凑形式

Term 1
Definition 1
Term 2
Definition 2a
Definition 2b

无序列表

  • fruits
    • apples
      • macintosh
      • red delicious
    • pears
    • peaches
  • vegetables
    • broccoli
    • chard

对应 LaTeX 列表环境里的有序环境,通篇计数

  1. My first example will be numbered (1).
  2. My second example will be numbered (2).

Explanation of examples.

  1. My third example will be numbered (3).

(@) 环境可以引用

  1. 这是一个好例子

正如 (4) 所指出的那样,

列表里包含代码块

  • item one
  • item two
{ my code block }

显示反引号 `

14.1.2 强调

轻微强调
这是倾斜的文字 下划线表示强调, and this is 星花表示强调.
特别强调
这是加粗的文字 strong emphasis and with underscores.
强烈强调
这是斜体加粗的文字 三个星花
删除线
This is deleted text.
上下标
H2O is a liquid. 210 is 1024. C137 是一种放射性元素

14.1.3 引用

注意在引用末尾空两格,出处另起一行,引用名人名言:

It’s always better to give than to receive.

或者

A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.
— John Gruber

Trellis graphics are a bit like hash functions: you can be close to the target, but get a far-off result.39

— Dieter Menne

If you imagine that this pen is Trellis, then Lattice is not this pen.40
— Paul Murrell

You’re overlooking something like line 800 of the documentation for xyplot. […] It’s probably in the R-FAQ as well, since my original feeling was that this behaviour was chosen in order to confuse people and see how many people read the FAQ… :)41
— Barry Rowlingson

14.1.4 表格

插入表格很简单的,如表 14.1 所示,还带脚注哦,复杂的表格制作可以借助 R 包 knitr 提供的 kable 函数以及 kableExtra 包42,此外谢益辉的书籍 bookdown: Authoring Books and Technical Documents with R Markdown 中也有一节专门介绍表格 https://bookdown.org/yihui/bookdown/tables.html

表 14.1: 表格标题43
First Header Second Header
Content Cell Content Cell
Content Cell Content Cell

kable 支持多个表格并排,如表 14.2 所示

knitr::kable(
  list(
    head(iris[, 1:2], 3),
    head(mtcars[, 1:3], 5)
  ),
  caption = 'A Tale of Two Tables.', booktabs = TRUE
)
表 14.2: A Tale of Two Tables.
Sepal.Length Sepal.Width
5.1 3.5
4.9 3.0
4.7 3.2
mpg cyl disp
Mazda RX4 21.0 6 160
Mazda RX4 Wag 21.0 6 160
Datsun 710 22.8 4 108
Hornet 4 Drive 21.4 6 258
Hornet Sportabout 18.7 8 360

在表格中引入数学符号

knitr::kable(
  rbind(c("", "continuous", "discrete"),
        c("nominal", "", "$\\checkmark$"),
        c("ordinal", "", "$\\checkmark$"),
        c("interval", "$\\checkmark$", "$\\checkmark$"),
        c("ratio", "$\\checkmark$", "$\\checkmark$")
    
  )
  , caption = 'The relationship between the scales of measurement and the discrete/continuity distinction. Cells with a tick mark correspond to things that are possible.', align="lcc",
  booktabs = TRUE
)
表 14.3: The relationship between the scales of measurement and the discrete/continuity distinction. Cells with a tick mark correspond to things that are possible.
continuous discrete
nominal \(\checkmark\)
ordinal \(\checkmark\)
interval \(\checkmark\) \(\checkmark\)
ratio \(\checkmark\) \(\checkmark\)

kableExtra 、broom 和 pixiedust 包实现表格样式的精细调整,如黄湘云制作的 样例

14.1.5 图片

插入图片大体遵循的语法如下

![...](...){...}

中括号包含图片的标题,小括号是图片插入路径,大括号控制图片属性

利用 knitr::include_graphics 函数在代码块中插入图片是很简单的,如图14.1所示,图、表的标题很长或者需要插入脚注,可以使用[文本引用][text-references]

knitr::include_graphics(path = system.file("help/figures", "mai.png", package = "graphics"))
(ref:footnote)

图 14.1: (ref:footnote)

par(mar = c(4.1, 4.1, 0.5, 0.5))
plot(rnorm(10), xlab = "", ylab = "")
(ref:fig-cap)

图 8.6: (ref:fig-cap)

插入一幅普通图片,如图 14.2 和图 14.3 所示分别控制图片插入的宽度[^css-position]

![(\#fig:left-fig) 默认图片位置居左^[这里是脚注]](figures/mai.png){ width=45% }

图 14.2: 默认图片位置居左44

![(\#fig:full-fig) 一幅全宽的图片](figures/mai.png){.full}

图 14.3: 一幅全宽的图片

  • One

  • Two

    还可以在列表环境中插入图片

  • Three

根据代码动态生成图片,并插入文档中;外部图片插入文档中

plot(AirPassengers)
时间序列图

图 14.4: 时间序列图

plot(pressure)
plot(AirPassengers)
2行1列布局2行1列布局

图 14.5: 2行1列布局

plot(pressure)
plot(AirPassengers)
1行2列布局1行2列布局

图 14.6: 1行2列布局

plot(pressure)
plot(AirPassengers)
plot(pressure)
plot(AirPassengers)
2x2图布局2x2图布局2x2图布局2x2图布局

图 14.7: 2x2图布局

(ref:fig-cap) 测试文本引用 (ref:text-references) 图表标题很长可使用[文本引用][text-references] (ref:footnote) 表格标题里插入脚注,但是 ebooks 不支持这样插入脚注[^longnote] [^longnote]: Here’s one with multiple blocks. [text-references]: https://bookdown.org/yihui/bookdown/markdown-extensions-by-bookdown.html#text-references [^css-position]: 参考谢益辉的博客: CSS 的位置属性以及如何居中对齐超宽元素 https://yihui.name/cn/2018/05/css-position/

14.1.6 公式

行内公式一对美元符号 \(\alpha\) 或者 \(\alpha+\beta\),行间公式 \[\alpha\] 或者 \[\alpha + \beta\] 对公式编号,如公式 (14.1)

\[\begin{equation} L(\beta,\boldsymbol{\theta}) = f(y;\beta,\boldsymbol{\theta}) = \int_{\mathbb{R}^{n}}N(t;D\beta,\Sigma(\boldsymbol{\theta}))f(y|t)dt \tag{14.1} \end{equation}\]

多行公式分别编号,如公式(14.2) 和公式(14.3)

\[\begin{align} \log\{\frac{p_i}{1-p_i}\} & = T_{i} = d(x_i)'\beta + S(x_i) + Z_i \tag{14.2}\\ \log(\lambda_i) & = T_{i} = d(x_i)'\beta + S(x_i) + Z_i \tag{14.3} \end{align}\]

多行公式中对某一(些)行编号,如公式 (14.4) 和 公式 (14.5)

\[\begin{align} g(X_{n}) &= g(\theta)+g'({\tilde{\theta}})(X_{n}-\theta) \\ \sqrt{n}[g(X_{n})-g(\theta)] &= g'\left({\tilde{\theta}}\right) \sqrt{n}[X_{n}-\theta ] \tag{14.4} \\ \log(\lambda_i) & = T_{i} = d(x_i)'\beta + S(x_i) + Z_i \tag{14.5} \end{align}\]

多行公式共用一个编号,如公式 (14.6)

\[\begin{equation} \begin{aligned} L(\beta,\boldsymbol{\theta}) & = \int_{\mathbb{R}^{n}} \frac{N(t;D\beta,\Sigma(\boldsymbol{\theta}))f(y|t)}{N(t;D\beta_{0},\Sigma(\boldsymbol{\theta}_{0}))f(y|t)}f(y,t)dt\\ & \varpropto \int_{\mathbb{R}^{n}} \frac{N(t;D\beta,\Sigma(\boldsymbol{\theta}))}{N(t;D\beta_{0},\Sigma(\boldsymbol{\theta}_{0}))}f(t|y)dt \\ &= E_{T|y}\left[\frac{N(t;D\beta,\Sigma(\boldsymbol{\theta}))}{N(t;D\beta_{0},\Sigma(\boldsymbol{\theta}_{0}))}\right] \end{aligned} \tag{14.6} \end{equation}\]

推荐在 equation 公式中,使用 split 环境,意思是一个公式很长,需要拆成多行,如公式(14.7)

\[\begin{equation} \begin{split} \mathrm{Var}(\hat{\beta}) & =\mathrm{Var}((X'X)^{-1}X'y)\\ & =(X'X)^{-1}X'\mathrm{Var}(y)((X'X)^{-1}X')'\\ & =(X'X)^{-1}X'\mathrm{Var}(y)X(X'X)^{-1}\\ & =(X'X)^{-1}X'\sigma^{2}IX(X'X)^{-1}\\ & =(X'X)^{-1}\sigma^{2} \end{split} \tag{14.7} \end{equation}\]

注意,\mathbf 只对字母 \(a,b,c,A,B,C\) 加粗,mathjax 不支持公式中使用 \bm\(\theta,\alpha,\beta,\ldots,\gamma\) 加粗,应该使用 \boldsymbol

14.2 Pandoc’s Markdown

介绍在 Markdown 的基础上添加的功能

14.3 R Markdown

R Markdown 站在巨人的肩膀上,这些巨人有 MarkdownPandocLaTeX 等。

生态系统

  1. 报告
  2. 网络服务
  3. 展示

在指定目录创建 Book 项目,

bookdown:::bookdown_skeleton("~/bookdown-demo")

项目根目录的文件列表

directory/
├──  index.Rmd
├── 01-intro.Rmd
├── 02-literature.Rmd
├── 03-method.Rmd
├── 04-application.Rmd
├── 05-summary.Rmd
├── 06-references.Rmd
├── _bookdown.yml
├── _output.yml
├──  book.bib
├──  preamble.tex
├──  README.md
└──  style.css

14.3.1 语法高亮

Pandoc 通过 LaTeX 环境 lstlisting 支持语法高亮,比如

```TeX
\begin{lstlisting}
\documentclass[cn]{elegantbook} 
\documentclass[lang=cn]{elegantbook}
\end{lstlisting}

\begin{lstlisting}[frame=single]
\nocite{EINAV2010,Havrylchyk2018} %or include some bibitems
\nocite{*} %include all the bibitems
\end{lstlisting}
```
# knit 支持的编程语言及其语法高亮环境
names(knitr::knit_engines$get())
#>  [1] "awk"         "bash"        "coffee"      "gawk"        "groovy"     
#>  [6] "haskell"     "lein"        "mysql"       "node"        "octave"     
#> [11] "perl"        "psql"        "Rscript"     "ruby"        "sas"        
#> [16] "scala"       "sed"         "sh"          "stata"       "zsh"        
#> [21] "highlight"   "Rcpp"        "tikz"        "dot"         "c"          
#> [26] "fortran"     "fortran95"   "asy"         "cat"         "asis"       
#> [31] "stan"        "block"       "block2"      "js"          "css"        
#> [36] "sql"         "go"          "python"      "julia"       "sass"       
#> [41] "scss"        "theorem"     "lemma"       "corollary"   "proposition"
#> [46] "conjecture"  "definition"  "example"     "exercise"    "proof"      
#> [51] "remark"      "solution"
# knit 支持的语法高亮主题
# Pandoc 支持的语法高亮环境
c(
  "ABAP", "IDL", "Plasm", "ACSL",
  "inform", "POV", "Ada", "Java", "Prolog",
  "Algol", "JVMIS", "Promela", "Ant", "ksh",
  "Python", "Assembler", "Lisp", "R", "Awk",
  "Logo", "Reduce", "bash", "make", "Rexx",
  "Basic", "Mathematica", "RSL", "C", "Matlab",
  "Ruby", "C++", "Mercury", "S", "Caml",
  "MetaPost", "SAS", "Clean", "Miranda", "Scilab",
  "Cobol", "Mizar", "sh", "Comal", "ML", "SHELXL",
  "csh", "Modula-2", "Simula", "Delphi",
  "MuPAD", "SQL", "Eiffel", "NASTRAN", "tcl",
  "Elan", "Oberon-2", "TeX", "erlang",
  "OCL", "VBScript", "Euphoria", "Octave",
  "Verilog", "Fortran", "Oz", "VHDL", "GCL",
  "Pascal", "VRML", "Gnuplot", "Perl", "XML",
  "Haskell", "PHP", "XSLT", "HTML", "PL/I"
)
#>  [1] "ABAP"        "IDL"         "Plasm"       "ACSL"        "inform"     
#>  [6] "POV"         "Ada"         "Java"        "Prolog"      "Algol"      
#> [11] "JVMIS"       "Promela"     "Ant"         "ksh"         "Python"     
#> [16] "Assembler"   "Lisp"        "R"           "Awk"         "Logo"       
#> [21] "Reduce"      "bash"        "make"        "Rexx"        "Basic"      
#> [26] "Mathematica" "RSL"         "C"           "Matlab"      "Ruby"       
#> [31] "C++"         "Mercury"     "S"           "Caml"        "MetaPost"   
#> [36] "SAS"         "Clean"       "Miranda"     "Scilab"      "Cobol"      
#> [41] "Mizar"       "sh"          "Comal"       "ML"          "SHELXL"     
#> [46] "csh"         "Modula-2"    "Simula"      "Delphi"      "MuPAD"      
#> [51] "SQL"         "Eiffel"      "NASTRAN"     "tcl"         "Elan"       
#> [56] "Oberon-2"    "TeX"         "erlang"      "OCL"         "VBScript"   
#> [61] "Euphoria"    "Octave"      "Verilog"     "Fortran"     "Oz"         
#> [66] "VHDL"        "GCL"         "Pascal"      "VRML"        "Gnuplot"    
#> [71] "Perl"        "XML"         "Haskell"     "PHP"         "XSLT"       
#> [76] "HTML"        "PL/I"

14.4 表格样式

在数据分析报告中,根据报告的文本格式,我们有不同的数据呈现形式,基于 HTML 和 LaTeX 甚至 DOCX

表格样式工具 gt kableExtra flextableDT

remedy 格式化 Markdown 语法 beautifyR 整理 Markdown 表格

14.4.1 HTML 样式

14.4.2 LaTeX 样式

14.5 插件

提高写作效率的 10 大 R 包或 RStudio 插件

  • 简化 Markdown 写作 remedy
  • 源代码截图 carbonate
  • 整理 Markdown 表格 beautifyR
  • 引用参考文献 citr
  • 格式化 R 代码块 styler
  • 准备可重复的例子,方便在论坛/Github上发问 reprex
  • 快速获取 Github 等社交网络活动记录 butteRfly
  • 统计 R Markdown 文档中的单词 wordcountaddin
  • 写可重复性研究报告 rrtools
  • RStudio 插件集合 addinslist
  • 高亮支持 R 帮助文档 rdoc

  1. (about problems with creating a suitable lattice panel function) R-help (August 2008)↩︎

  2. (on the difference of Lattice (which eventually was called grid) and Trellis) DSC 2001, Wien (March 2001)↩︎

  3. (about the fact that lattice objects have to be print()ed) R-help (May 2005)↩︎

  4. https://xiangyunhuang.github.io/bookdown-kableExtra/↩︎

  5. 附有脚注↩︎

  6. 这里是脚注↩︎