第 14 章 动态性文档
R Markdown 文档(???) 中的 Python 代码块是由 knitr 包 (???) 负责调度处理的,展示 Matplotlib 绘图的结果使用了 reticulate 包 (???) 提供的 Python 引擎而不是 knitr 自带的。
- LaTeX 专家黄晨成写的译文 Matplotlib 教程
- 周沫凡 制作的莫烦 Python 系列视频教程之 Matplotlib 数据可视化神器
- 陈治兵维护的在线 Matplotlib 中文文档
软件信息
编译书籍使用的 Python 3 模块有
R 代码运行环境
sessionInfo()
#> R Under development (unstable) (2019-11-11 r77397)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 8.1 x64 (build 9600)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936
#> [2] LC_CTYPE=Chinese (Simplified)_China.936
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.0.0 magrittr_1.5 bookdown_0.15 tools_4.0.0
#> [5] htmltools_0.4.0 curl_4.2 yaml_2.2.0 Rcpp_1.0.3
#> [9] stringi_1.4.3 rmarkdown_1.17 knitr_1.26 stringr_1.4.0
#> [13] xfun_0.11 digest_0.6.22 rlang_0.4.1 evaluate_0.14
在 knitr::opts_chunk
中设置 python.reticulate = TRUE
意味着所有的 Python 代码块共享一个 Python Session,而 python.reticulate = FALSE
意味着使用 knitr 提供的 Python 引擎,所有的 Python 代码块独立运行。
python.reticulate = TRUE
会使用 reticulate 提供的 Python 引擎,它支持 matplotlib 绘图,但是不支持图 caption,knitr 的 python 引擎是支持 caption 的
R 和 Python 之间的交互,Python 负责数据处理和建模, R 负责绘图,有些复杂的机器学习模型及其相关数据操作需要在 Python 中完成,数据集清理至数据框的形式后导入到 R 中,画各种静态或者动态图,这时候需要加载 reticulate 包,只是设置 python.reticulate = TRUE
还不够
R 调用 Python
pandas 读取数据,整理后由 reticulate 包传递给 R 环境中的 data.frame 对象,加载 ggplot2 绘图
如图 ?? 所示
library(reticulate)
library(ggplot2)
ggplot(py$iris2, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(color = Species)) +
scale_color_viridis_d()
library(ggplot2)
ggplot(faithfuld, aes(waiting, eruptions)) +
geom_raster(aes(fill = density)) +
scale_fill_continuous()
shiny
三剑客 Markdown & Pandoc’s Markdown & R Markdown Markdown for scientific writing
首先介绍 Markdown 在强调、标题、列表、断行、链接、图片、引用、代码块、LaTeX 公式等使用方式,然后在 Markdown 的基础上介绍 Pandoc’s Markdown 功能有加强的地方,R Markdown 在 Pandoc’s Markdown 的基础上介绍功能有加强的地方
14.1 Markdown
Markdown 基础语法见 RStudio IDE 自带的 Markdown 手册:RStudio 顶部菜单栏 -> Help -> Markdown Quick Reference,这里主要介绍一下Markdown 高级语法,特别是 Pandoc’s Markdown,其实是 Pandoc 提供了很多对 Markdown 的扩展支持,下面介绍一下被 Pandoc 加强后的 Markdown 表格、图片和公式的使用
14.1.1 列表
有序的列表
- 第一条
- 第二条
无序的列表
第一条
第二条
here is my first list item.
and my second.
嵌套的列表
- 有序
- Item 2
- Item 3
- Item 3a
- Item 3b
- 无序
- Item 2
- Item 2a
- Item 2b
定义型列表中包含代码
- Term 1
Definition 1
- Term 2 with inline markup
Definition 2
{ some code, part of Definition 2 }
Third paragraph of definition 2.
定义类型的列表,紧凑形式
- Term 1
- Definition 1
- Term 2
- Definition 2a
- Definition 2b
无序列表
- fruits
- apples
- macintosh
- red delicious
- pears
- peaches
- apples
- vegetables
- broccoli
- chard
对应 LaTeX 列表环境里的有序环境,通篇计数
- My first example will be numbered (1).
- My second example will be numbered (2).
Explanation of examples.
- My third example will be numbered (3).
(@)
环境可以引用
- 这是一个好例子
正如 (4) 所指出的那样,
列表里包含代码块
- item one
- item two
{ my code block }
显示反引号 `
14.1.2 强调
- 轻微强调
- 这是倾斜的文字 下划线表示强调, and this is 星花表示强调.
- 特别强调
- 这是加粗的文字 strong emphasis and with underscores.
- 强烈强调
- 这是斜体加粗的文字 三个星花
- 删除线
- This
is deleted text. - 上下标
- H2O is a liquid. 210 is 1024. C137 是一种放射性元素
14.1.3 引用
注意在引用末尾空两格,出处另起一行,引用名人名言:
It’s always better to give than to receive.
或者
A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.
— John Gruber
Trellis graphics are a bit like hash functions: you can be close to the target, but get a far-off result.39
— Dieter Menne
If you imagine that this pen is Trellis, then Lattice is not this pen.40
— Paul Murrell
You’re overlooking something like line 800 of the documentation for xyplot. […] It’s probably in the R-FAQ as well, since my original feeling was that this behaviour was chosen in order to confuse people and see how many people read the FAQ… :)41
— Barry Rowlingson
14.1.4 表格
插入表格很简单的,如表 14.1 所示,还带脚注哦,复杂的表格制作可以借助 R 包 knitr 提供的 kable 函数以及 kableExtra 包42,此外谢益辉的书籍 bookdown: Authoring Books and Technical Documents with R Markdown 中也有一节专门介绍表格 https://bookdown.org/yihui/bookdown/tables.html
First Header | Second Header |
---|---|
Content Cell | Content Cell |
Content Cell | Content Cell |
kable
支持多个表格并排,如表 14.2 所示
knitr::kable(
list(
head(iris[, 1:2], 3),
head(mtcars[, 1:3], 5)
),
caption = 'A Tale of Two Tables.', booktabs = TRUE
)
|
|
在表格中引入数学符号
knitr::kable(
rbind(c("", "continuous", "discrete"),
c("nominal", "", "$\\checkmark$"),
c("ordinal", "", "$\\checkmark$"),
c("interval", "$\\checkmark$", "$\\checkmark$"),
c("ratio", "$\\checkmark$", "$\\checkmark$")
)
, caption = 'The relationship between the scales of measurement and the discrete/continuity distinction. Cells with a tick mark correspond to things that are possible.', align="lcc",
booktabs = TRUE
)
continuous | discrete | |
nominal | \(\checkmark\) | |
ordinal | \(\checkmark\) | |
interval | \(\checkmark\) | \(\checkmark\) |
ratio | \(\checkmark\) | \(\checkmark\) |
kableExtra 、broom 和 pixiedust 包实现表格样式的精细调整,如黄湘云制作的 样例
14.1.5 图片
插入图片大体遵循的语法如下
中括号包含图片的标题,小括号是图片插入路径,大括号控制图片属性
利用 knitr::include_graphics
函数在代码块中插入图片是很简单的,如图14.1所示,图、表的标题很长或者需要插入脚注,可以使用[文本引用][text-references]

图 14.1: (ref:footnote)

图 8.6: (ref:fig-cap)
插入一幅普通图片,如图 14.2 和图 14.3 所示分别控制图片插入的宽度[^css-position]

图 14.2: 默认图片位置居左44

图 14.3: 一幅全宽的图片
One
Two
还可以在列表环境中插入图片
Three
根据代码动态生成图片,并插入文档中;外部图片插入文档中

图 14.4: 时间序列图


图 14.5: 2行1列布局


图 14.6: 1行2列布局




图 14.7: 2x2图布局
(ref:fig-cap) 测试文本引用 (ref:text-references) 图表标题很长可使用[文本引用][text-references] (ref:footnote) 表格标题里插入脚注,但是 ebooks 不支持这样插入脚注[^longnote] [^longnote]: Here’s one with multiple blocks. [text-references]: https://bookdown.org/yihui/bookdown/markdown-extensions-by-bookdown.html#text-references [^css-position]: 参考谢益辉的博客: CSS 的位置属性以及如何居中对齐超宽元素 https://yihui.name/cn/2018/05/css-position/
14.1.6 公式
行内公式一对美元符号 \(\alpha\) 或者 \(\alpha+\beta\),行间公式 \[\alpha\] 或者 \[\alpha + \beta\] 对公式编号,如公式 (14.1)
\[\begin{equation} L(\beta,\boldsymbol{\theta}) = f(y;\beta,\boldsymbol{\theta}) = \int_{\mathbb{R}^{n}}N(t;D\beta,\Sigma(\boldsymbol{\theta}))f(y|t)dt \tag{14.1} \end{equation}\]
\[\begin{align} \log\{\frac{p_i}{1-p_i}\} & = T_{i} = d(x_i)'\beta + S(x_i) + Z_i \tag{14.2}\\ \log(\lambda_i) & = T_{i} = d(x_i)'\beta + S(x_i) + Z_i \tag{14.3} \end{align}\]
多行公式中对某一(些)行编号,如公式 (14.4) 和 公式 (14.5)
\[\begin{align} g(X_{n}) &= g(\theta)+g'({\tilde{\theta}})(X_{n}-\theta) \\ \sqrt{n}[g(X_{n})-g(\theta)] &= g'\left({\tilde{\theta}}\right) \sqrt{n}[X_{n}-\theta ] \tag{14.4} \\ \log(\lambda_i) & = T_{i} = d(x_i)'\beta + S(x_i) + Z_i \tag{14.5} \end{align}\]
多行公式共用一个编号,如公式 (14.6)
\[\begin{equation} \begin{aligned} L(\beta,\boldsymbol{\theta}) & = \int_{\mathbb{R}^{n}} \frac{N(t;D\beta,\Sigma(\boldsymbol{\theta}))f(y|t)}{N(t;D\beta_{0},\Sigma(\boldsymbol{\theta}_{0}))f(y|t)}f(y,t)dt\\ & \varpropto \int_{\mathbb{R}^{n}} \frac{N(t;D\beta,\Sigma(\boldsymbol{\theta}))}{N(t;D\beta_{0},\Sigma(\boldsymbol{\theta}_{0}))}f(t|y)dt \\ &= E_{T|y}\left[\frac{N(t;D\beta,\Sigma(\boldsymbol{\theta}))}{N(t;D\beta_{0},\Sigma(\boldsymbol{\theta}_{0}))}\right] \end{aligned} \tag{14.6} \end{equation}\]
推荐在 equation
公式中,使用 split
环境,意思是一个公式很长,需要拆成多行,如公式(14.7)
\[\begin{equation} \begin{split} \mathrm{Var}(\hat{\beta}) & =\mathrm{Var}((X'X)^{-1}X'y)\\ & =(X'X)^{-1}X'\mathrm{Var}(y)((X'X)^{-1}X')'\\ & =(X'X)^{-1}X'\mathrm{Var}(y)X(X'X)^{-1}\\ & =(X'X)^{-1}X'\sigma^{2}IX(X'X)^{-1}\\ & =(X'X)^{-1}\sigma^{2} \end{split} \tag{14.7} \end{equation}\]
注意,\mathbf
只对字母 \(a,b,c,A,B,C\) 加粗,mathjax 不支持公式中使用 \bm
对 \(\theta,\alpha,\beta,\ldots,\gamma\) 加粗,应该使用 \boldsymbol
14.2 Pandoc’s Markdown
介绍在 Markdown 的基础上添加的功能
14.3 R Markdown
R Markdown 站在巨人的肩膀上,这些巨人有 Markdown、 Pandoc 和 LaTeX 等。
生态系统
- 报告
- learnr: Interactive Tutorials with R Markdown https://rstudio.github.com/learnr/
- r2d3: R Interface to D3 Visualizations https://rstudio.github.io/r2d3/
- radix: Radix combines the technical authoring features of Distill with R Markdown, enabling a fully reproducible workflow based on literate programming https://github.com/radixpub/radix-r
- 网络服务
- RestRserve: RestRserve is a R web API framework for building high-performance microservices and app backends https://github.com/dselivanov/RestRserve 基于 Rserve 在笔记本上处理请求的吞吐量是每秒10000次,比 plumber 快大约20倍
- plumber: Turn your R code into a web API. https://www.rplumber.io
- 展示
- revealjs: R Markdown Format for reveal.js Presentations https://github.com/rstudio/revealjs
- xaringan: Presentation Ninja 幻灯忍者写轮眼 https://slides.yihui.name/xaringan/
在指定目录创建 Book 项目,
项目根目录的文件列表
directory/
├── index.Rmd
├── 01-intro.Rmd
├── 02-literature.Rmd
├── 03-method.Rmd
├── 04-application.Rmd
├── 05-summary.Rmd
├── 06-references.Rmd
├── _bookdown.yml
├── _output.yml
├── book.bib
├── preamble.tex
├── README.md
└── style.css
14.3.1 语法高亮
Pandoc 通过 LaTeX 环境 lstlisting 支持语法高亮,比如
```TeX
\begin{lstlisting}
\documentclass[cn]{elegantbook}
\documentclass[lang=cn]{elegantbook}
\end{lstlisting}
\begin{lstlisting}[frame=single]
\nocite{EINAV2010,Havrylchyk2018} %or include some bibitems
\nocite{*} %include all the bibitems
\end{lstlisting}
```
# knit 支持的编程语言及其语法高亮环境
names(knitr::knit_engines$get())
#> [1] "awk" "bash" "coffee" "gawk" "groovy"
#> [6] "haskell" "lein" "mysql" "node" "octave"
#> [11] "perl" "psql" "Rscript" "ruby" "sas"
#> [16] "scala" "sed" "sh" "stata" "zsh"
#> [21] "highlight" "Rcpp" "tikz" "dot" "c"
#> [26] "fortran" "fortran95" "asy" "cat" "asis"
#> [31] "stan" "block" "block2" "js" "css"
#> [36] "sql" "go" "python" "julia" "sass"
#> [41] "scss" "theorem" "lemma" "corollary" "proposition"
#> [46] "conjecture" "definition" "example" "exercise" "proof"
#> [51] "remark" "solution"
# knit 支持的语法高亮主题
# Pandoc 支持的语法高亮环境
c(
"ABAP", "IDL", "Plasm", "ACSL",
"inform", "POV", "Ada", "Java", "Prolog",
"Algol", "JVMIS", "Promela", "Ant", "ksh",
"Python", "Assembler", "Lisp", "R", "Awk",
"Logo", "Reduce", "bash", "make", "Rexx",
"Basic", "Mathematica", "RSL", "C", "Matlab",
"Ruby", "C++", "Mercury", "S", "Caml",
"MetaPost", "SAS", "Clean", "Miranda", "Scilab",
"Cobol", "Mizar", "sh", "Comal", "ML", "SHELXL",
"csh", "Modula-2", "Simula", "Delphi",
"MuPAD", "SQL", "Eiffel", "NASTRAN", "tcl",
"Elan", "Oberon-2", "TeX", "erlang",
"OCL", "VBScript", "Euphoria", "Octave",
"Verilog", "Fortran", "Oz", "VHDL", "GCL",
"Pascal", "VRML", "Gnuplot", "Perl", "XML",
"Haskell", "PHP", "XSLT", "HTML", "PL/I"
)
#> [1] "ABAP" "IDL" "Plasm" "ACSL" "inform"
#> [6] "POV" "Ada" "Java" "Prolog" "Algol"
#> [11] "JVMIS" "Promela" "Ant" "ksh" "Python"
#> [16] "Assembler" "Lisp" "R" "Awk" "Logo"
#> [21] "Reduce" "bash" "make" "Rexx" "Basic"
#> [26] "Mathematica" "RSL" "C" "Matlab" "Ruby"
#> [31] "C++" "Mercury" "S" "Caml" "MetaPost"
#> [36] "SAS" "Clean" "Miranda" "Scilab" "Cobol"
#> [41] "Mizar" "sh" "Comal" "ML" "SHELXL"
#> [46] "csh" "Modula-2" "Simula" "Delphi" "MuPAD"
#> [51] "SQL" "Eiffel" "NASTRAN" "tcl" "Elan"
#> [56] "Oberon-2" "TeX" "erlang" "OCL" "VBScript"
#> [61] "Euphoria" "Octave" "Verilog" "Fortran" "Oz"
#> [66] "VHDL" "GCL" "Pascal" "VRML" "Gnuplot"
#> [71] "Perl" "XML" "Haskell" "PHP" "XSLT"
#> [76] "HTML" "PL/I"