# 前言

Book in early development. Planned release in 202X.

## 本书风格

1934 年 C. J. Clopper 和 E. S. Pearson 给出二项分布 $$B(n, p)$$ 参数 $$p$$ 的置信带 (Clopper and Pearson 1934)，图 0.2 提炼了文章的主要结果。

Base R 提供的 uniroot() 函数只能求取一元非线性方程的一个根，而 rootSolve 包提供的 uniroot.all() 函数可以求取所有的根。在给定分位点下，我们需要满足方程的最小的概率值。

Base R 提供的 binom.test() 函数可以精确计算置信区间，而 prop.test() 函数可近似计算置信区间。

# 近似计算 Wilson 区间
prop.test(x = 2, n = 10, p = 0.95, conf.level = 0.95, correct = TRUE)
## Warning in prop.test(x = 2, n = 10, p = 0.95, conf.level = 0.95,
## correct = TRUE): Chi-squared approximation may be incorrect
##
##  1-sample proportions test with continuity correction
##
## data:  2 out of 10, null probability 0.95
## X-squared = 103, df = 1, p-value <2e-16
## alternative hypothesis: true p is not equal to 0.95
## 95 percent confidence interval:
##  0.03543 0.55782
## sample estimates:
##   p
## 0.2
# 精确计算
binom.test(x = 2, n = 10, p = 0.95, conf.level = 0.95)
##
##  Exact binomial test
##
## data:  2 and 10
## number of successes = 2, number of trials = 10, p-value =
## 2e-09
## alternative hypothesis: true probability of success is not equal to 0.95
## 95 percent confidence interval:
##  0.02521 0.55610
## sample estimates:
## probability of success
##                    0.2

library(ggplot2)
ggplot(data.frame(x = c(0, 1)), aes(x)) +
stat_function(
fun = pbinom, geom = "path",
args = list(size = 10, q = 2),
color = "gray70", alpha = .8 # 颜色最浅
) +
stat_function(
fun = pbinom, geom = "path",
args = list(size = 10, q = 4),
color = "gray50", alpha = .8
) +
stat_function(
fun = pbinom, geom = "path",
args = list(size = 10, q = 6),
color = "gray30", alpha = .8
) +
labs(x = expression(theta), y = expression(p[theta]),
title = "pbinom() with fixed sample size = 10") +
annotate("text", label = "q = 2", x = 0.32, y = 0.50, colour = "gray70") +
annotate("text", label = "q = 4", x = 0.50, y = 0.50, colour = "gray50") +
annotate("text", label = "q = 6", x = 0.70, y = 0.50, colour = "gray30") +
theme_minimal(base_size = 16)

prop.test(x = 2, n = 10, p = 0.95, conf.level = 0.95, correct = TRUE)
## Warning in prop.test(x = 2, n = 10, p = 0.95, conf.level = 0.95,
## correct = TRUE): Chi-squared approximation may be incorrect
##
##  1-sample proportions test with continuity correction
##
## data:  2 out of 10, null probability 0.95
## X-squared = 103, df = 1, p-value <2e-16
## alternative hypothesis: true p is not equal to 0.95
## 95 percent confidence interval:
##  0.03543 0.55782
## sample estimates:
##   p
## 0.2
set.seed(2020)
rbinom(1, size = 30, prob = 0.2) # 得到观测值 7 
## [1] 7
7 + qnorm(1-0.95/2)*sqrt(0.2*0.8/30)
## [1] 7.005
prop.test(x = 7, n = 30, p = 0.2, conf.level = 0.95, correct = TRUE) # 得到观测值 7 对应的区间估计
##
##  1-sample proportions test with continuity correction
##
## data:  7 out of 30, null probability 0.2
## X-squared = 0.052, df = 1, p-value = 0.8
## alternative hypothesis: true p is not equal to 0.2
## 95 percent confidence interval:
##  0.1064 0.4270
## sample estimates:
##      p
## 0.2333
pbinom(7, size = 30, prob = 0.2, lower.tail = TRUE)
## [1] 0.7608
# 计算分位点
qbinom(p = 0.95, size = 30, prob = 0.2, lower.tail = TRUE)
## [1] 10
binom.test(x = 10, n = 30, p = 0.2, conf.level = 0.95)
##
##  Exact binomial test
##
## data:  10 and 30
## number of successes = 10, number of trials = 30, p-value =
## 0.1
## alternative hypothesis: true probability of success is not equal to 0.2
## 95 percent confidence interval:
##  0.1729 0.5281
## sample estimates:
## probability of success
##                 0.3333
prop.test(x = 5, n = 10, p = 0.2, conf.level = 0.95, correct = TRUE) # 得到观测值 7 对应的区间估计
## Warning in prop.test(x = 5, n = 10, p = 0.2, conf.level = 0.95,
## correct = TRUE): Chi-squared approximation may be incorrect
##
##  1-sample proportions test with continuity correction
##
## data:  5 out of 10, null probability 0.2
## X-squared = 3.9, df = 1, p-value = 0.05
## alternative hypothesis: true p is not equal to 0.2
## 95 percent confidence interval:
##  0.2014 0.7986
## sample estimates:
##   p
## 0.5

$\sum_{x = C_1 + 1}^{C_2 -1} x \binom{n}{x} p^{x}(1-p)^{n-x} = np \sum_{x = C_1 + 1}^{C_2 -1} \binom{n -1}{x -1} p^{x -1}(1-p)^{(n -1)-(x-1)}$

n = 30
c2 = 20
c1 = 10
p = 0.2
n * p * (pbinom(c2 - 2, n - 1, p) - pbinom(c1 - 1, n - 1, p))
## [1] 0.2956

set.seed(123)
x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
p <- 2 * pnorm(sort(-abs(x)))
round(p, 3)
##  [1] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## [11] 0.001 0.002 0.003 0.004 0.005 0.007 0.007 0.009 0.009 0.011
## [21] 0.021 0.049 0.061 0.063 0.074 0.083 0.086 0.119 0.189 0.206
## [31] 0.221 0.286 0.305 0.466 0.483 0.492 0.532 0.575 0.578 0.619
## [41] 0.636 0.645 0.656 0.689 0.719 0.818 0.827 0.897 0.912 0.944
# round(p.adjust(p), 3)
# round(p.adjust(p, "BH"), 3)

Charles J. Geyer 的文章 Fuzzy and Randomized Confidence Intervals and P-Values (Geyer and Meeden 2005) 文章中的图 1 名义覆盖概率的计算见 (Blyth and Hutchinson 1960)

binom.test(x = 2, n = 10, p = 0.95)
##
##  Exact binomial test
##
## data:  2 and 10
## number of successes = 2, number of trials = 10, p-value =
## 2e-09
## alternative hypothesis: true probability of success is not equal to 0.95
## 95 percent confidence interval:
##  0.02521 0.55610
## sample estimates:
## probability of success
##                    0.2
binom.test(x = 1, n = 10, p = 0.95)
##
##  Exact binomial test
##
## data:  1 and 10
## number of successes = 1, number of trials = 10, p-value =
## 2e-11
## alternative hypothesis: true probability of success is not equal to 0.95
## 95 percent confidence interval:
##  0.002529 0.445016
## sample estimates:
## probability of success
##                    0.1

# 置信系数 p 0.95
binom.test(x = 1, n = 10, p = 0.95)$conf.int ## [1] 0.002529 0.445016 ## attr(,"conf.level") ## [1] 0.95 binom.test(x = 1, n = 10, p = 0.95)$p.value
## [1] 1.865e-11

q 分位点

# 计算覆盖概率
pbinom(q = 1, size = 10, prob = 0.95, lower.tail = TRUE)
## [1] 1.865e-11
pbinom(q = 0:10/10, size = 10, prob = 0.95, lower.tail = TRUE)
##  [1] 9.766e-14 9.766e-14 9.766e-14 9.766e-14 9.766e-14 9.766e-14
##  [7] 9.766e-14 9.766e-14 9.766e-14 9.766e-14 1.865e-11
round(pbinom(0:4, 10, 1 / 2), 5)
## [1] 0.00098 0.01074 0.05469 0.17188 0.37695
# power.prop.test() # 比例检验的功效

## 语言抉择

Let’s not kid ourselves: the most widely used piece of software for statistics is Excel.

— Brian D. Ripley (Ripley 2002)

Some people familiar with R describe it as a supercharged version of Microsoft’s Excel spreadsheet software.

— Ashlee Vance3

R 提供了丰富的图形接口，包括 Tcl/Tk , Gtk, Shiny 等，以及基于它们的衍生品 rattle（RGtk2）、Rcmdr（tcl/tk）、radiant（shiny）。更多底层介绍，见 John Chamber 的著作《Extending R》。

TikZ 在绘制示意图方面有很大优势，特别是示意图里包含数学公式，这更是 LaTeX 所擅长的方面

JASP https://jasp-stats.org 是一款免费的统计软件，源代码托管在 Github 上 https://github.com/jasp-stats/jasp-desktop，主要由阿姆斯特丹大学 E. J. Wagenmakers 教授 https://www.ejwagenmakers.com/ 领导的团队维护开发，实现了很多贝叶斯和频率统计方法，相似的图形用户界面使得 JASP 可以作为 SPSS 的替代，目前实现的功能见 https://jasp-stats.org/current-functionality/，统计方法见博客 https://www.bayesianspectacles.org/

Patrick Burns 收集整理了 R 语言中奇葩的现象，写成 The R Inferno 直译过来就是《R 之炼狱》。这些奇葩的怪现象可以看做是 R 风格的一部分，对于编程人员来说就是一些建议和技巧，参考之可以避开某些坑。 Paul E. Johnson 整理了一份真正的 R 语言建议，记录了他自己从 SAS 转换到 R 的过程中遇到的各种问题 http://pj.freefaculty.org/R/Rtips.html。Michail Tsagris 和 Manos Papadakis 也收集了 70 多条 R 编程的技巧和建议，力求以更加 R 范地将语言特性发挥到极致 (Tsagris and Papadakis 2018)。 Python 社区广泛流传着 Tim Peters 的 《Python 之禅》，它已经整合进每一版 Python 软件中，只需在 Python 控制台里执行 import this 可以获得。

1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
5. Flat is better than nested.
6. Sparse is better than dense.
8. Special cases aren’t special enough to break the rules.
9. Although practicality beats purity.
10. Errors should never pass silently.
11. Unless explicitly silenced.
12. In the face of ambiguity, refuse the temptation to guess.
13. There should be one– and preferably only one –obvious way to do it.
14. Although that way may not be obvious at first unless you’re Dutch.
15. Now is better than never.
16. Although never is often better than right now.
17. If the implementation is hard to explain, it’s a bad idea.
18. If the implementation is easy to explain, it may be a good idea.
19. Namespaces are one honking great idea – let’s do more of those!

— The Zen of Python

## 获取帮助

R 社区提供了丰富的帮助资源，可以在 R 官网搜集的高频问题 https://cran.r-project.org/faqs.html 中查找，也可在线搜索 https://cran.r-project.org/search.htmlhttps://rseek.org/https://stackoverflow.com/questions/tagged/r，更多获取帮助方式见 https://www.r-project.org/help.html。在当下信息爆炸的时代，唯一不缺的就是各种学习资源：

## 写作环境

xfun::session_info(packages = c(
"knitr", "rmarkdown", "bookdown"
), dependencies = FALSE)
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 8 (Core)
##
## Locale:
##   LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##   LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
##   LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##   LC_PAPER=en_US.UTF-8       LC_NAME=C
##   LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## Package version:
##   bookdown_0.19   knitr_1.28.5    rmarkdown_2.1.3
##
## Pandoc version: 2.7.3

sudo dnf copr enable simc/stable # gdal-devel
sudo dnf install -y sqlite-devel gdal-devel \
proj-devel geos-devel udunits2-devel

install.packages('sf')

sudo dnf install -y \
# magick
ImageMagick-c++-devel \
# pdftools
poppler-cpp-devel \
# gifski
cargo 

## 记号约定

ruler()
----+----1----+----2----+----3----+----4----+----5----+----6----+----
123456789012345678901234567890123456789012345678901234567890123456789

Winston Chang 整理了一份 LaTeX 常用命令速查小抄 https://wch.github.io/latexsheet/latexsheet.pdf

### 参考文献

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2020. Rmarkdown: Dynamic Documents for R. https://github.com/rstudio/rmarkdown.

Blyth, Colin R., and David W. Hutchinson. 1960. “Table of Neyman-Shortest Unbiased Confidence Intervals for the Binomial Parameter.” Biometrika 47 (3/4): 381–91. http://www.jstor.org/stable/2333308.

Brown, Lawrence D., T. Tony Cai, and Anirban DasGupta. 2001. “Interval Estimation for a Binomial Proportion.” Statistical Science, no. 2: 101–33. https://projecteuclid.org/euclid.ss/1009213286.

Clopper, C. J., and E. S. Pearson. 1934. “The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial.” Biometrika 26 (4): 404–13. https://doi.org/10.1093/biomet/26.4.404.

Geyer, Charles J., and Glen D. Meeden. 2005. “Fuzzy and Randomized Confidence Intervals and P -Values.” Statistical Science 20 (4): 358–66. https://doi.org/10.1214/088342305000000340.

Hornik, Kurt. 2020. “R FAQ: Frequently Asked Questions on R.” https://CRAN.R-project.org/doc/FAQ/R-FAQ.html.

Ripley, Brian D. 2002. “Statistical Methods Need Software: A View of Statistical Computing.” Opening Lecture Royal Statistical Society. Plymouth. https://www.stats.ox.ac.uk/~ripley/RSS2002.pdf.

Tsagris, Michail, and Manos Papadakis. 2018. “Taking R to Its Limits: 70+ Tips.” PeerJ Preprints 6: e26605v1. https://doi.org/10.7287/peerj.preprints.26605v1.

Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. 2nd ed. New York: Springer-Verlag. https://ggplot2-book.org/.

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.

Xie, Yihui. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.

Xie, Yihui. 2019. “TinyTeX: A Lightweight, Cross-Platform, and Easy-to-Maintain Latex Distribution Based on TeX Live.” TUGboat, no. 1: 30–32. https://tug.org/TUGboat/Contents/contents40-1.html.