第 20 章 贝叶斯数据分析

“If you want to master something, teach it.”

— ― Yogi Bhajan

贝叶斯分析的内容十分丰富,应用也非常广泛,尤其在心理学。下学期我们争取能揭开面纱的一角。 欢迎大家继续选课。本章我尽可能的不去翻译,一方面我翻译水平有限,二是尽量保持原汁原味

主要内容:

  • 贝叶斯公式
  • 贝叶斯应用场景
  • 频率学派和贝叶斯数据分析的区别
  • 贝叶斯数据分析的优势
  • 什么是 bootstrapping?
  • 什么是Markov Chain Monte Carlo
  • Posterior, prior, likelihood, sample size.
  • 贝叶斯数据分析的步骤
  • 网格近似及其局限
  • stan 与 brms
  • 样本大小对后验概率的影响
  • 先验概率对后验概率的影响

20.1 贝叶斯应用场景

我经常会去打车,比如滴滴,假定有两个司机

  1. 等级是5 颗星,历史成交量5
  2. 等级是4.68 颗星,历史成交量1000

我们该如何选择?

20.2 贝叶斯公式

对于参数 \(\theta\) 和数据 \(D\),贝叶斯公式可以写为

\[ \begin{align*} p(\theta|D) & = \frac{p(D|\theta)p(\theta)}{p(D)} \;\; \text{and since} \\ p(D) & = \sum\limits_{\theta^*}p(D|\theta^*)p(\theta^*) \;\; \text{it's also the case that} \\ p(\theta|D) & = \frac{p(D|\theta)p(\theta)}{\sum\limits_{\theta^*}p(D|\theta^*)p(\theta^*)}. \end{align*} \]

这里, \[ \underbrace{p(\theta|D)}_\text{posterior} \; = \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; / \; \underbrace{p(D)}_\text{evidence}. \] 作者(Kruschke, (2015))原书的解释如下:

The “prior,” \(p(\theta)\), is the credibility of the \(\theta\) values without the data \(D\). The “posterior,” \(p(\theta|D)\), is the credibility of \(\theta\) values with the data \(D\) taken into account. The “likelihood,” \(p(D|\theta)\), is the probability that the data could be generated by the model with parameter value \(\theta\). The “evidence” for the model, \(p(D)\), is the overall probability of the data according to the model, determined by averaging across all possible parameter values weighted by the strength of belief in those parameter values. (pp. 106–107)

20.3 Frequentists vs Bayesians

Frequentists assume parameters are fixed and data is random. Bayesians assume parameters are random and data is fixed"

20.4 Markov Chain Monte Carlo

MCMC的两个核心算法

20.4.1 Metropolis-Hastings:

  • Has a proposal distribution (stochastic perturbation of initial state)
  • Sample from it
  • Calculate the acceptance probability
  • Accepts with that probability (correction; reject if too far from typical set)

20.4.2 Hamiltonian Monte Carlo:

  • multivariate setting, sample from conditional probability with one parameter fixed

20.5 Steps of Bayesian Data Analysis

一般来说。贝叶斯数据分析有如下五个步骤(摘自 Kruschke, J. K. (2015)):

  • Identify the data relevant to the research questions. What are the measurement scales of the data? Which data variables are to be predicted, and which data variables are supposed to act as predictors?
  • Define a descriptive model for the relevant data. The mathematical form and its parameters should be meaningful and appropriate to the theoretical purposes of the analysis.
  • Specify a prior distribution on the parameters. The prior must pass muster with the audience of the analysis, such as skeptical scientists.
  • Use Bayesian inference to re-allocate credibility across parameter values. Interpret the posterior distribution with respect to theoretically meaningful issues (assuming that the model is a reasonable description of the data; see next step).
  • Check that the posterior predictions mimic the data with reasonable accuracy (i.e., conduct a “posterior predictive check”). If not, then consider a different descriptive model.

下面通过具体的案例来演示贝叶斯数据分析流程

20.6 Real Data Example

这里,我们模拟了川师男生和女生的身高体重

我们想知道学生身高的分布情况,

通过以上探索,我们有理由猜测:身高服从正态分布,且均值和方差在以下范围

20.6.2 likelihood

这里我们计算,每个(mu, sigma)组合的分布下,数据中身高值(d$height)出现的概率密度dnorm(d2$height, mean = mu, sd = sigma),然后加起来。 很显然,不同的(mu, sigma),概率密度之和不一样,我们这里有10001000 个(mu, sigma)组合, 所以会产生 10001000 个概率密度之和

20.6.4 posterior

20.6.5 sampling from posterior

后验分布按照probability值的大小来抽样。先有分布,然后基于这种分布完成MCMC抽样。

也可以用tidybayes::mode_hdi()得到后验概率的最高密度区间

以上是通过网格近似的方法得到weight分布的后验概率,但这种方法需要构建参数网格,对于较复杂的模型,计算量会陡增,内存占用大、比较费时,因此在实际的数据中,一般不采用这种方法,但网格近似的方法可以帮助我们很好地理解贝叶斯数据分析。

20.7 使用brms

brms提供了很好的贝叶斯数据分析的框架,下面就用brms重新完成上面的工作。

##             Estimate Est.Error     Q2.5    Q97.5
## b_Intercept  164.886    0.5183  163.817  165.889
## sigma          7.355    0.3835    6.652    8.155
## lp__        -687.910    1.0589 -690.699 -686.905

20.8 增加预测变量

brms 功能强大,可以完成几乎所有回归模型的贝叶斯分析

20.8.1 选择模型

数学公式表示

  • \(h_i \sim \text{Normal}(\mu_i, \sigma)\): family = gaussian
  • \(\mu_i = \alpha + \beta x_i\): height ~ 1 + weight
  • \(\alpha \sim \text{Normal}(166, 100)\): prior(normal(166, 100), class = Intercept
  • \(\beta \sim \text{Normal}(0, 10)\): prior(normal(0, 10), class = b)
  • \(\sigma \sim \text{Uniform}(0, 50)\): prior(uniform(0, 50), class = sigma)

20.8.4 Posterior Distribution

也可以这样画

20.8.5 posterior predictive check

20.8.6 小结

20.8.6.1 case 1

math brms
\(h_i \sim \text{Normal}(\mu, \sigma)\) family = gaussian
\(\mu_i = \alpha\) height ~ 1
\(\mu \sim \text{Normal}(168, 20)\) prior(normal(168, 20), class = Intercept)
\(\sigma \sim \text{Uniform}(0, 50)\) prior(uniform(0, 50), class = sigma)
  • height不受其他变量影响,其均值对应的是截距Intercept,
  • brms 返回参数 \((\mu, \sigma)\), 即Intercept 和 \(\sigma\)

20.8.6.2 case 2

math brms
\(h_i \sim \text{Normal}(\mu, \sigma)\) family = gaussian
\(\mu_i = \alpha + \beta x_i\) height ~ 1 + weight
\(\alpha \sim \text{N}(166, 100)\) prior(normal(166, 100), class = Intercept)
\(\beta \sim \text{N}(0, 10)\) prior(normal(0, 10), class = b)
\(\sigma \sim \text{Uniform}(0, 50)\) prior(uniform(0, 50), class = sigma)
  • height受其他变量影响,其均值对应的是截距\(\alpha + \beta x_i\)
  • brms 返回参数 \((\mu, \sigma)\),即 Intercept, \(\beta 和 \sigma\)

20.9 样本大小对后验概率的影响

我写了一个动画,代码在这里, 可以帮助大家理解—数据样本大小对后验概率的影响

20.10 先验概率对后验概率的影响

coming soon