Chapter 4 Data Asymptotics
Suppose \(p(\theta\mid y)\) is unimodel and roughly symmetric, a Taylor series expansion of the logarithm of the posterior around the posterior mode \(\hat\theta\) is \[ \log p(\theta \mid y)=\log p(\hat{\theta} \mid y)-\frac{1}{2}(\theta-\hat{\theta})^{\top}\left[-\frac{d^{2}}{d \theta^{2}} \log p(\theta \mid y)\right]_{\theta=\hat{\theta}}(\theta-\hat{\theta})+\cdots \] Discarding the higher order terms, this expansionsion provides a normal approximation to the posterior, i.e. \(p(\theta\mid y) \stackrel{d}{\approx} N(\hat\theta, I(\hat\theta)^{-1})\).
Theorem 1: If the parameter space \(\Theta\) is discrete and \(P(\theta = \theta_0) > 0\), then \(P(\theta = \theta_0\mid y) \rightarrow 1\) as \(n \rightarrow \infty\).
Theorem 2: If the parameter space \(\Theta\) is continuous and \(A\) is a neighborhood around \(\Theta_0\) with \(P(\theta\in A) > 0\), then \(P(\theta \in A \mid y) \rightarrow 1\) as \(n \rightarrow \infty\).
An estimator is consistent, i.e. \(\hat\theta \stackrel{p}{\rightarrow} \theta_0\) if \(\lim_{n \rightarrow \infty} P(|\hat\theta - \theta_0| < \epsilon) = 1\). Under regularity conditions, \(\hat\theta_{MLE} \stackrel{p}{\rightarrow} \theta_0\) .
Example 1: Binomial example
Let \(y \sim Bin(n, \theta)\) and \(\theta \sim Be(a, b)\), then \(\theta\mid y \sim Be(a + y, b + n - y)\) and the posterior mode is \(\hat\theta = \frac{y'}{n'} = \frac{a + y - 1}{a + b + n - 2}\). Thus \(I(\hat\theta) = \frac{n'}{\hat\theta(1 - \hat\theta)}\) and \(p(\theta\mid y) \stackrel{d}{\approx } N\left(\hat\theta, \frac{\hat\theta(1 - \hat\theta)}{n'} \right)\).
Recall that \(\hat\theta_{MLE} = y/n\). The following estimators are all consistent
- Posterior mean: \(\frac{a + y}{a + b + n}\)
- Posterior median: \(\approx \frac{a + y - 1/3}{a + b + n - 2/3}\) for \(a, b > 1\)
- Posterior mode: \(\frac{a + y - 1}{a + b + n - 2}\)
since as \(n \rightarrow \infty\), these all converage to \(\hat\theta_{MLE} = y/n\).
Example 2: Normal example
Consider \(Y_i \stackrel{iid}{\sim} N(\theta, 1)\) with known and prior \(\theta \sim N(c, 1)\), then \(\theta\mid y \sim N\left(\frac{1}{n+1}c + \frac{n}{n+1}\bar y, \frac{1}{n+1} \right)\).
Recall that \(\hat\theta_{MLE} = \bar y\), and the posterior mean coverages to the MLE.
Asymptotic normality
For large \(n\), we have \[ \log p(\theta \mid y) \approx \log p(\hat{\theta} \mid y)-\frac{1}{2}(\theta-\hat{\theta})^{\top}\left[n \mathrm{I}\left(\theta_{0}\right)\right](\theta-\hat{\theta}) \] where \(\hat\theta\) is the posterior mode. Since \(\hat\theta \rightarrow \theta_0\) and \(I(\hat\theta) \rightarrow I(\theta_0)\) as \(n \rightarrow \infty\), we have \[ p(\theta\mid y )\propto \exp\left(-\frac{1}{2}(\theta - \hat\theta)^T\left[n I(\hat\theta) \right] (\theta - \hat\theta) \right). \] Thus, \(\theta\mid y \stackrel{d}{\rightarrow} N\left(\hat\theta, \frac{1}{n} I(\hat\theta)^{-1} \right)\), i.e. the posterior distribution is asymptotically normal.
Suppose that \(f(y)\) the true sampling distribution does not correspond to \(p(y\mid \theta)\) for some \(\theta = \theta_0\). Then the posterior \(p(\theta\mid y)\) converges to a \(\theta_0\) that is the smallest in Kullback-Leibler divergence to the true \(f(y)\) where \[ K L(f(y) \| p(y \mid \theta))=E\left[\log \left(\frac{f(y)}{p(y \mid \theta)}\right)\right]=\int \log \left(\frac{f(y)}{p(y \mid \theta)}\right) f(y) d y \] That is, we do about the best that we can given that we have assumed the wrong sampling distribution \(p(y \mid \theta)\).