Delta Method, Sufficiency principle (Lecture on 01/14/2020)
Starting from limiting distribution of standardized random variable, we now consider distribution of some function of random variable. Delta method is such a method of approximating the mean and variance of a random variable, based on using a Taylor series approximation.
Definition 3.1 (Taylor Polynomial) If a function g(x) has derivatives of order r, that is g(r)(x)=drdxrg(x) exists, then for any constant a, the Taylor polynomial of order r about a is
Tr(x)=r∑i=0g(r)(a)(x−a)ii!
Taylor’s major theorem states that the remainder from the approximation g(x)−Tr(x) always tends to 0 faster than the highest-order explicit term. That is
lim
For multivariate case, let T_1,\cdots,T_k be random variables with means \theta_1,\cdots,\theta_k and define \mathbf{T}=(T_1,\cdots,T_k) and \mathbf{\theta}=(\theta_1,\cdots,\theta_k). Suppose there is a differentiable function g(\mathbf{T}) for which we want an approximate estimate of variance. Define
\begin{equation}
g'_i(\mathbf{\theta})=\frac{\partial}{\partial t_i}g(\mathbf{t})|_{t_1=\theta_1,\cdots,t_k=\theta_k}
\tag{3.3}
\end{equation}
The first-order Taylor series expansion of g about \mathbf{\theta} is
\begin{equation}
g(\mathbf{t})=g(\mathbf{\theta})+\sum_{i=1}^kg'_i(\mathbf{\theta})(t_i-\theta_i)+Remainder
\tag{3.4}
\end{equation}
For statistical approximation, the remainder can be ignored and we end up with
\begin{equation}
g(\mathbf{t})\approx g(\mathbf{\theta})+\sum_{i=1}^kg'_i(\mathbf{\theta})(t_i-\theta_i)
\tag{3.5}
\end{equation}
Now take expectation on both sides of (3.5)
\begin{equation}
E_{\mathbf{\theta}}g(\mathbf{\theta})\approx g(\mathbf{T})+\sum_{i=1}^kg'_i(\mathbf{\theta})E_{\mathbf{T}}(T_i-\theta_i)=g(\mathbf{\theta})
\tag{3.6}
\end{equation}
Now approximate the variance of g(\mathbf{T}) by
\begin{equation}
\begin{split}
Var_{\mathbf{T}}(g(\mathbf{T}))&\approx E_{\mathbf{T}}([g(\mathbf{T})-g(\mathbf{\theta})]^2)\\
&\approx E_{\mathbf{T}}((\sum_{i=1}^kg'_i(\mathbf{\theta})(T_i-\theta_i))^2)\\
&=\sum_{i=1}^k[g'_i(\mathbf{\theta})]^2Var_{\mathbf{T}}(T_i)+2\sum_{i>j}g'_i(\mathbf{\theta})g'_j(\mathbf{\theta})Cov_{\mathbf{T}}(T_i,T_j)
\end{split}
\tag{3.7}
\end{equation}
Theorem 3.1 (Delta Method) Let Y_n be a sequence of random variables that satisfies \sqrt{n}(Y_n-\theta)\to N(0,\sigma^2) in distribution. For a given function g and a specific value of \theta, suppose that g^{\prime}(\theta) exists and is not 0. Then
\begin{equation}
\sqrt{n}[g(Y_n)-g(\theta)]\to N(0,\sigma^2[g^{\prime}(\theta)]^2)
\tag{3.8}
\end{equation}
in distribution.
Proof. The Taylor expansion of
g(Y_n) around
Y_n=\theta is
\begin{equation}
g(Y_n)=g(\theta)+g^{\prime}(\theta)(Y_n-\theta)+Remainder
\tag{3.9}
\end{equation}
where the remainder
\to0 as
Y_n\to\theta. Now we first show from
\sqrt{n}(Y_n-\theta)\to N(0,\sigma^2) in distribution we get
Y_n\to\theta in probability. Since
P(|Y_n-\theta|<\epsilon)=P(|\sqrt{n}(Y_n-\theta)|<\sqrt{n}\epsilon), we have
\begin{equation}
\lim_{n\to\infty}P(|Y_n-\theta|<\epsilon)=\lim_{n\to\infty}P(|\sqrt{n}(Y_n-\theta)|<\sqrt{n}\epsilon)=P(|Z|<\infty)=1
\tag{3.10}
\end{equation}
where
Z\sim N(0,\sigma^2). Thus, we have the remainder converges to 0 in probability. Then by applying part (b) of Slutsky Theorem (Theorem
2.9) we get the result.
Example 3.1 (Approximate mean and variance) Suppose
X is a random variable with
EX=\mu\neq0. If we want to estimate a function
g(\mu), a first-order approximation would give us
g(X)=g(\mu)+g^{\prime}(\mu)(X-\mu). If we use
g(X) as an estimator of
g(\mu), then approximately we have
\begin{align}
&Eg(X)\approx g(\mu) \tag{3.11}\\
&Var(g(X))\approx [g^{\prime}(\mu)]^2Var(X) \tag{3.12}\\
\end{align}
For a specific example, take
g(\mu)=1/\mu, we estimate
1/\mu with
1/X which approximately has mean and variance
\begin{align}
&E\frac{1}{X}\approx \frac{1}{\mu} \tag{3.13}\\
&Var(\frac{1}{X})\approx [\frac{1}{\mu^4}]Var(X) \tag{3.14}\\
\end{align}
Suppose now we have the mean of a random sample
\bar{X}. For
\mu\neq0, by Theorem
3.1 \sqrt{n}(\frac{1}{\bar{X}}-\frac{1}{\mu})\to N(0,\frac{1}{\mu^4}Var(X_1)) in distribution. Usually
\mu and
Var(X_1) are both unknown, which can be estimated by
\bar{X} and
S^2 respectively. Thus, we have the approximate variance
\hat{Var}(\frac{1}{\bar{X}})\approx (\frac{1}{\bar{X}^4})S^2. And since
\bar{X} and
S^2 are consistent estimator, apply Slutskey theorem again and conclude that for
\mu\neq 0,
\begin{equation}
\frac{\sqrt{n}(\frac{1}{\bar{X}}-\frac{1}{\mu})}{(\frac{1}{X})^2S}\to N(0,1)
\tag{3.15}
\end{equation}
in distribution.
There are two extensions of basic Delta Method. The first one is dealing with the case when g^{\prime}(\theta)=0. If this happens, we take one more term in Taylor expansion to get
\begin{equation}
g(Y_n)=g(\theta)+g^{\prime}(\theta)(Y_n-\theta)+\frac{g^{\prime\prime}(\theta)}{2}(Y_n-\theta)^2+Remainder
\tag{3.16}
\end{equation}
Rearranging terms we get
\begin{equation}
g(Y_n)-g(\theta)=\frac{g^{\prime\prime}(\theta)}{2}(Y_n-\theta)^2+Remainder
\tag{3.17}
\end{equation}
and notice that the square of a N(0,1) r.v. is a \chi_1^2, which implies \frac{n(Y_n-\mu)^2}{\sigma^2}\to\chi_1^2 in distribution. Therefore, we have the first extension of Delta Method.
Theorem 3.2 (Second-order Delta Method) Let Y_n be a sequence of random variables that satisfies \sqrt{n}(Y_n-\theta)\to N(0,\sigma^2) in distribution. For a given function g and a specific value of \theta, suppose that g^{\prime}(\theta)=0, g^{\prime\prime}(\theta) exists and is not 0. Then
\begin{equation}
n[g(Y_n)-g(\theta)]\to \sigma^2\frac{g^{\prime\prime}(\theta)}{2}\chi_1^2
\tag{3.18}
\end{equation}
in distribution.
The second extension deals with mulvariate case. First consider the following example dealing with moments of a ratio estimator.
Example 3.2 Suppose X and Y are random variables with nonzero mean \mu_X and \mu_Y, respectively. The parametrix function to be estimated is g(\mu_X,\mu_Y)=\frac{\mu_X}{\mu_Y}. It is straightforward to calculate \frac{\partial}{\partial\mu_X}=\frac{1}{\mu_Y} and \frac{\partial}{\partial\mu_Y}=\frac{-\mu_X}{\mu^2_Y}. Then by (3.6) and (3.7) we have
\begin{equation}
E(\frac{X}{Y})\approx \frac{\mu_X}{\mu_Y}
\tag{3.19}
\end{equation}
and
\begin{equation}
\begin{split}
Var(\frac{X}{Y})&\approx \frac{1}{\mu_Y^2}Var(X)+\frac{\mu_X^2}{\mu_Y^4}Var(Y)-2\frac{\mu_X}{\mu_Y^3}Cov(X,Y)\\
&=(\frac{\mu_X}{\mu_Y})^2(\frac{Var(X)}{\mu_X^2}+\frac{Var(Y)}{\mu_Y^2}-2\frac{Cov(X,Y)}{\mu_X\mu_Y})
\end{split}
\tag{3.20}
\end{equation}
Thus, we obtain an approximation which use only the means, variance and covariance of
X and
Y, while the exact calculation is hopeless.
Inspired by the demand of estimation using more than one variable to a function containing more than one parameter, we have the following multivariate case Delta Method.
Theorem 3.3 (Multivariate Delta Method)
Let \mathbf{X}_1,\cdots,\mathbf{X}_n be a random sample with E(X_{ij})=\mu_i and Cov(X_{ik},X_{jk})=\sigma_{ij}. For a given function g with continuous first partial derivatives and a specific value of \mathbf{\mu}=(\mu_1,\cdots,\mu_p) for which \tau^2=\sum\sum\sigma_{ij}\frac{\partial g(\mu)}{\partial\mu_i}\cdot\frac{\partial g(\mu)}{\partial\mu_j}>0,
\begin{equation}
\sqrt{n}[g(\bar{X}_1,\cdots,\bar{X}_s)-g(\mu_1,\cdots,\mu_p)]\to N(0,\tau^2)
\tag{3.21}
\end{equation}
in distribution.
A sufficient statistic for a parameter \theta is a statistic that, in a certain sense, captures all the information about \theta contained in the sample. Any additional information in the sample, besides the value of the sufficient statistic, does not contain any more information about \theta.
Definition 3.2 (SUFFICIENCY PRINCIPLE) If T(\mathbf{X}) is a sufficient statistic of \theta, then any inference about \theta should depend on the sample \mathbf{X} only through the value T(\mathbf{X}). That is, if \mathbf{x} and \mathbf{y} are two sample points such that T(\mathbf{x})=T(\mathbf{y}), then the inference about \theta should be the same whether \mathbf{X}=\mathbf{x} or \mathbf{X}=\mathbf{y} is observed.
Definition 3.3 (Sufficient Statistic) A statistic T(\mathbf{X}) is a sufficient statistic for \theta if the conditional distribution of the sample \mathbf{X} given the value of T(\mathbf{X}) dose not depend on \theta.
To use Definition 3.3 to verify that a statistic T(\mathbf{X}) is a sufficient statistic for \theta, we must verify for any fixed \mathbf{x} and t, the conditional probability P_{\theta}(\mathbf{X}=\mathbf{x}|T(\mathbf{X})=t) is the same for all valuse of \theta. (Here P_{\theta} means \theta is a parameter in this distribution.) Since if T(\mathbf{x})\neq t, this probability is 0 regardless of \theta. We only need to verify P_{\theta}(\mathbf{X}=\mathbf{x}|T(\mathbf{X})=T(\mathbf{x})) does not depend on \theta. Since S_1:=\{s\in S: \mathbf{X}(s)=\mathbf{x}\} is a subset of S_2:=\{s\in S: T(\mathbf{X}(s))=T(\mathbf{x})\},
\begin{equation}
\begin{split}
P_{\theta}(\mathbf{X}=\mathbf{x}|T(\mathbf{X})=T(\mathbf{x}))&=\frac{P_{\theta}
(\mathbf{X}=\mathbf{x}\, and \, T(\mathbf{X})=T(\mathbf{x}))}{P_{\theta}(T(\mathbf{X})=T(\mathbf{x}))}\\
&=\frac{P_{\theta}(\mathbf{X}=\mathbf{x})}{P_{\theta}(T(\mathbf{X})=T(\mathbf{x}))}\\
&=\frac{p(\mathbf{x}|\theta)}{q(T(\mathbf{x})|\theta)}
\end{split}
\tag{3.22}
\end{equation}
Here p(\mathbf{x}|\theta) is the joint pmf of the sample \mathbf{X} and q(t|\theta) is the pmf of T(\mathbf{X}). Thus, T(\mathbf{X}) is a sufficient statistic for \theta iff \forall \mathbf{x}, the above ratio of pmfs is constant as a function of \theta.
Theorem 3.4 p(\mathbf{x}|\theta) is the joint pdf of the sample \mathbf{X} and q(t|\theta) is the pdf or pmf of T(\mathbf{X}), then T(\mathbf{X}) is a sufficient statistic for \theta if \forall \mathbf{x} in the sample space, the ratio \frac{p(\mathbf{x}|\theta)}{q(T(\mathbf{x})|\theta)} is constant as a function of \theta.
Example 3.3 (Normal Sufficient Statistic) Let
X_1,\cdots,X_n be i.i.d.
N(\mu,\sigma^2) where
\sigma^2 is known. We wish to show that sample mean
T(\mathbf{X})=\bar{X} is a sufficient statistic for
\mu. The joint pdf of sample
\mathbf{X} is
\begin{equation}
\begin{split}
f(\mathbf{x}|\mu)&=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-\sum_{i=1}^n(x_i-\mu)^2/(2\sigma^2))\\
&=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-\sum_{i=1}^n(x_i-\bar{x}+\bar{x}-\mu)^2/(2\sigma^2))\\
&=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-(\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\mu)^2)/(2\sigma^2))
\end{split}
\tag{3.23}
\end{equation}
Since
\bar{X}\sim N(\mu,\sigma^2/n), the ratio of pdf is
\begin{equation}
\begin{split}
\frac{p(\mathbf{x}|\theta)}{q(T(\mathbf{x})|\theta)}&=\frac{(2\pi\sigma^2)^{-n/2}exp(-(\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\mu)^2)/(2\sigma^2))}{(2\pi\sigma^2/n)^{-1/2}exp(-n(\bar{x}-\mu)^2/(2\sigma^2))}\\
&=n^{-1/2}(2\pi\sigma^2)^{-(n-1)/2}exp(-\sum_{i=1}^n(x_i-\bar{x})^2/(2\sigma^2))
\end{split}
\tag{3.24}
\end{equation}
which does not depend on
\mu. Hence by Theorem
3.4, the sample mean is a sufficient statistic for porpulation mean
\mu.