# 第 2 章 贝叶斯数据分析

“If you want to master something, teach it.”

— ― Yogi Bhajan

• 贝叶斯公式
• 贝叶斯应用场景
• 频率学派和贝叶斯数据分析的区别
• 贝叶斯数据分析的优势
• 什么是 bootstrapping?
• 什么是Markov Chain Monte Carlo
• Posterior, prior, likelihood, sample size.
• 贝叶斯数据分析的步骤
• 网格近似及其局限
• stan 与 brms
• 样本大小对后验概率的影响
• 先验概率对后验概率的影响
library(tidyverse)
library(brms)
library(tidybayes)
library(patchwork)

## 2.1 贝叶斯应用场景

1. 等级是5 颗星，历史成交量5
2. 等级是4.68 颗星，历史成交量1000

## 2.2 贝叶斯公式

\begin{align*} p(\theta|D) & = \frac{p(D|\theta)p(\theta)}{p(D)} \;\; \text{and since} \\ p(D) & = \sum\limits_{\theta^*}p(D|\theta^*)p(\theta^*) \;\; \text{it's also the case that} \\ p(\theta|D) & = \frac{p(D|\theta)p(\theta)}{\sum\limits_{\theta^*}p(D|\theta^*)p(\theta^*)}. \end{align*}

The “prior,” $$p(\theta)$$, is the credibility of the $$\theta$$ values without the data $$D$$. The “posterior,” $$p(\theta|D)$$, is the credibility of $$\theta$$ values with the data $$D$$ taken into account. The “likelihood,” $$p(D|\theta)$$, is the probability that the data could be generated by the model with parameter value $$\theta$$. The “evidence” for the model, $$p(D)$$, is the overall probability of the data according to the model, determined by averaging across all possible parameter values weighted by the strength of belief in those parameter values. (pp. 106–107)

## 2.3 Frequentists vs Bayesians

Frequentists assume parameters are fixed and data is random. Bayesians assume parameters are random and data is fixed"

## 2.4 Markov Chain Monte Carlo

MCMC的两个核心算法

### 2.4.1 Metropolis-Hastings：

• Has a proposal distribution (stochastic perturbation of initial state)
• Sample from it
• Calculate the acceptance probability
• Accepts with that probability (correction; reject if too far from typical set)

### 2.4.2 Hamiltonian Monte Carlo：

• multivariate setting, sample from conditional probability with one parameter fixed

## 2.5 Steps of Bayesian Data Analysis

• Identify the data relevant to the research questions. What are the measurement scales of the data? Which data variables are to be predicted, and which data variables are supposed to act as predictors?
• Define a descriptive model for the relevant data. The mathematical form and its parameters should be meaningful and appropriate to the theoretical purposes of the analysis.
• Specify a prior distribution on the parameters. The prior must pass muster with the audience of the analysis, such as skeptical scientists.
• Use Bayesian inference to re-allocate credibility across parameter values. Interpret the posterior distribution with respect to theoretically meaningful issues (assuming that the model is a reasonable description of the data; see next step).
• Check that the posterior predictions mimic the data with reasonable accuracy (i.e., conduct a “posterior predictive check”). If not, then consider a different descriptive model.

## 2.6 Real Data Example

set.seed(1024)

# boy:
HtMmu   <- 168.52
HtMsd   <- 6.87
WtMmu   <- 58.14
WtMsd   <- 3.17
Mrho    <- 0.42
Mmean   <- c(HtMmu, WtMmu)
Msigma  <- matrix(c(HtMsd^2, Mrho * HtMsd * WtMsd,
Mrho * HtMsd * WtMsd, WtMsd^2), nrow = 2)

d1 <-
MASS::mvrnorm(n = 100, mu = Mmean, Sigma = Msigma)  %>%
data.frame() %>%
purrr::set_names("height", "weight") %>%
mutate(sex = "boy") %>%
select(sex, everything())

d1 %>% head()
# girl:
HtFmu   <- 161.11
HtFsd   <- 5.76
WtFmu   <- 48.06
WtFsd   <- 4.24
Frho    <- 0.41
prop    <- 0.46
Fmean   <- c(HtFmu, WtFmu)
Fsigma  <- matrix(c(HtFsd^2, Frho * HtFsd * WtFsd,
Frho * HtFsd * WtFsd, WtFsd^2), nrow = 2)

d2 <-
MASS::mvrnorm(n = 100, mu = Fmean, Sigma = Fsigma)  %>%
data.frame() %>%
purrr::set_names("height", "weight") %>%
mutate(sex = "girl") %>%
select(sex, everything())

d2 %>% head()
d <- bind_rows(d1, d2)

d %>%
head()

d %>%
summarise(
min = min(height),
max = max(height),
sd = sd(height)
)
d %>%
ggplot(aes(x = height)) +
geom_density()

prior_mu       = dnorm(mu,    mean = 167, sd  = 20, log = T)
prior_sigma    = dunif(sigma, min  = 0,   max = 50, log = T)
p1 <-
ggplot(data = tibble(x = seq(from = 100, to = 230, by = .1)),
aes(x = x, y = dnorm(x, mean = 167, sd = 20))) +
geom_line() +
xlab("height_mean") +
ylab("density")

p2 <-
ggplot(data = tibble(x = seq(from = -10, to = 55, by = .1)),
aes(x = x, y =  dunif(x, min  = 0,   max = 50))) +
geom_line() +
xlab("height_sd") +
ylab("density")

p1 + p2

### 2.6.1 grid

n <- 1000
d_grid <-
tibble(mu    = seq(from = 150, to = 190, length.out = n),
sigma = seq(from = 4,   to = 9,   length.out = n)) %>%
#tidyr::expand(mu, sigma)
modelr::data_grid(mu, sigma)

### 2.6.2 likelihood

grid_function <- function(mu, sigma){
dnorm(d2$height, mean = mu, sd = sigma, log = T) %>% sum() } d_grid %>% mutate(log_likelihood = map2_dbl(mu, sigma, grid_function))  ### 2.6.3 prior d_grid %>% mutate(prior_mu = dnorm(mu, mean = 178, sd = 20, log = T), prior_sigma = dunif(sigma, min = 0, max = 50, log = T))  ### 2.6.4 posterior d_grid <- d_grid %>% mutate(log_likelihood = map2_dbl(mu, sigma, grid_function)) %>% mutate(prior_mu = dnorm(mu, mean = 168, sd = 20, log = T), prior_sigma = dunif(sigma, min = 0, max = 50, log = T)) %>% mutate(product = log_likelihood + prior_mu + prior_sigma) %>% mutate(probability = exp(product - max(product))) head(d_grid) d_grid %>% ggplot(aes(x = mu, y = sigma, z = probability)) + geom_contour() + labs(x = expression(mu), y = expression(sigma)) + coord_cartesian(xlim = range(d_grid$mu),
ylim = range(d_grid\$sigma)) +
theme(panel.grid = element_blank())

d_grid %>%
ggplot(aes(x = mu, y = sigma)) +
geom_raster(aes(fill = probability),
interpolate = T) +
scale_fill_viridis_c(option = "A") +
labs(x = expression(mu),
y = expression(sigma)) +
theme(panel.grid = element_blank())

### 2.6.5 sampling from posterior

d_grid_samples <-
d_grid %>%
sample_n(size = 1e4, replace = T, weight = probability)
d_grid_samples %>%
ggplot(aes(x = mu, y = sigma)) +
geom_point(size = .9, alpha = 1/15) +
scale_fill_viridis_c() +
labs(x = expression(mu[samples]),
y = expression(sigma[samples])) +
theme(panel.grid = element_blank())

d_grid_samples %>%
select(mu, sigma) %>%
pivot_longer(
cols = everything(),
names_to = "key",
values_to = "value"
) %>%
ggplot(aes(x = value)) +
geom_density(fill = "grey33", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~key, scales = "free")

library(tidybayes)

d_grid_samples %>%
select(mu, sigma) %>%
pivot_longer(
cols = everything(),
names_to = "key",
values_to = "value"
) %>%
group_by(key) %>%
mode_hdi(value)

## 2.7 使用brms

brms提供了很好的贝叶斯数据分析的框架，下面就用brms重新完成上面的工作。

b20.1 <-
brm(data = d, family = gaussian,
height ~ 1,                  #
prior = c(prior(normal(168, 20), class = Intercept),
prior(uniform(0, 50), class = sigma)),
iter = 31000, warmup = 30000, chains = 4, cores = 4,
seed = 4)
plot(b20.1)

post <- posterior_samples(b20.1)
head(post)
# 以下三者是等价的
post %>%
select(-lp__) %>%
pivot_longer(
cols = everything(),
names_to = "parameter",
values_to = "value"
) %>%
group_by(parameter) %>%
summarise(mean = mean(value),
SD   = sd(value),
2.5_percentile  = quantile(value, probs = .025),
97.5_percentile = quantile(value, probs = .975)) %>%
mutate_if(is.numeric, round, digits = 2)
# 再或者
post %>%
select(-lp__) %>%
pivot_longer(
cols = everything(),
names_to = "parameter",
values_to = "value"
) %>%
group_by(parameter) %>%
tidybayes::mean_qi(value)
# 与上等价
posterior_summary(b20.1)
##             Estimate Est.Error     Q2.5    Q97.5
## b_Intercept  164.886    0.5183  163.817  165.889
## sigma          7.355    0.3835    6.652    8.155
## lp__        -687.910    1.0589 -690.699 -686.905

## 2.8 增加预测变量

brms 功能强大，可以完成几乎所有回归模型的贝叶斯分析

### 2.8.1 选择模型

• $$h_i \sim \text{Normal}(\mu_i, \sigma)$$: family = gaussian
• $$\mu_i = \alpha + \beta x_i$$: height ~ 1 + weight
• $$\alpha \sim \text{Normal}(166, 100)$$: prior(normal(166, 100), class = Intercept
• $$\beta \sim \text{Normal}(0, 10)$$: prior(normal(0, 10), class = b)
• $$\sigma \sim \text{Uniform}(0, 50)$$: prior(uniform(0, 50), class = sigma)

### 2.8.2 Specifying Priors

ggplot(data = tibble(x = seq(from = 100, to = 230, by = .1)),
aes(x = x, y = dnorm(x, mean = 166, sd = 20))) +
geom_line() +
ylab("density")

tibble(x = seq(from = -10, to = 60, by = .1)) %>%

ggplot(aes(x = x, y = dunif(x, min = 0, max = 50))) +
geom_line() +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())

tibble(x = seq(from = -10, to = 10, by = .1)) %>%

ggplot(aes(x = x, y = dcauchy(x))) +
geom_line() +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())

tibble(x = seq(from = -10, to = 10, by = .1)) %>%

ggplot() +
geom_line(aes(x = x, y = dcauchy(x))) +
geom_line(aes(x = x, y = dnorm(x, mean = 0, sd = 2))) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())

### 2.8.3 Obtain the Posterior Distributions

b20.2 <-
brm(data = d,
family = gaussian,
weight ~ 1 + height,
prior = c(prior(normal(0, 100), class = Intercept),
prior(normal(0, 100), class = b),
prior(cauchy(0, 10),  class = sigma)),
chains = 4, cores = 4, iter = 4000, warmup = 2000,
seed = 2)

plot(b20.2)

### 2.8.4 Posterior Distribution

# extract the posterior draws
post <- posterior_samples(b20.2)

# this will streamline some of the code, below
n_lines <- 150

# plot!
d %>%
ggplot(aes(x = height, y = weight)) +
geom_abline(intercept = post[1:n_lines, 1],
slope     = post[1:n_lines, 2],
color = "grey50", size = 1/4, alpha = .3) +
geom_point(alpha = 2/3) +
labs(subtitle = glue::glue("Data with ", n_lines, " credible regression lines"),
x = "Height in cm",
y = "Weight in kg") +
theme(panel.grid = element_blank())

library(tidybayes)

post %>%
ggplot(aes(x = b_height)) +
geom_histogram(color = "grey92", fill = "grey67",
binwidth = .02, size = .2) +
stat_pointintervalh(aes(y = 0),
point_interval = mode_hdi, .width = .95) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "The posterior distribution",
subtitle = "The mode and 95% HPD intervals are\nthe dot and horizontal line at the bottom.",
x = expression(paste(beta[1], " (slope)"))) +
theme(panel.grid = element_blank())

nd <- tibble(height = seq(from = 145, to = 190, length.out = 60))

b20.2 %>%
predict(newdata = nd) %>%
as_tibble() %>%
bind_cols(nd) %>%

ggplot(aes(x = height)) +
geom_pointrange(aes(y = Estimate,
ymin = Q2.5,
ymax = Q97.5),
color = "grey67",
shape = 20) +
geom_point(data =  d,
aes(y = weight),
alpha = 2/3) +
labs(subtitle = "Data with the percentile-based 95% intervals and\nthe means of the posterior predictions",
x = "Height in inches",
y = "Weight in inches") +
theme(panel.grid = element_blank())

nd <- tibble(height = seq(from = 145, to = 190, length.out = 60))

b20.2 %>%
predict(newdata = nd) %>%
as_tibble() %>%
bind_cols(nd) %>%

ggplot(aes(x = height)) +
geom_ribbon(aes(ymin = Q2.5,
ymax = Q97.5),
fill = "grey75") +
geom_line(aes(y = Estimate),
color = "grey92") +
geom_point(data =  d,
aes(y = weight),
alpha = 2/3) +
labs(subtitle = "Data with the percentile-based 95% intervals and\nthe means of the posterior predictions",
x = "Height in inches",
y = "Weight in inches") +
theme(panel.grid = element_blank())

### 2.8.5 posterior predictive check

pp_check(b20.2) + ggtitle("Outcome")

### 2.8.6 小结

#### 2.8.6.1 case 1

math brms
$$h_i \sim \text{Normal}(\mu, \sigma)$$ family = gaussian
$$\mu_i = \alpha$$ height ~ 1
$$\mu \sim \text{Normal}(168, 20)$$ prior(normal(168, 20), class = Intercept)
$$\sigma \sim \text{Uniform}(0, 50)$$ prior(uniform(0, 50), class = sigma)
• height不受其他变量影响，其均值对应的是截距Intercept，
• brms 返回参数 $$(\mu, \sigma)$$， 即Intercept 和 $$\sigma$$

#### 2.8.6.2 case 2

math brms
$$h_i \sim \text{Normal}(\mu, \sigma)$$ family = gaussian
$$\mu_i = \alpha + \beta x_i$$ height ~ 1 + weight
$$\alpha \sim \text{N}(166, 100)$$ prior(normal(166, 100), class = Intercept)
$$\beta \sim \text{N}(0, 10)$$ prior(normal(0, 10), class = b)
$$\sigma \sim \text{Uniform}(0, 50)$$ prior(uniform(0, 50), class = sigma)
• height受其他变量影响，其均值对应的是截距$$\alpha + \beta x_i$$
• brms 返回参数 $$(\mu, \sigma)$$，即 Intercept, $$\beta 和 \sigma$$

coming soon