Chapter 3 Delta Method, Sufficiency principle (Lecture on 01/14/2020)

Starting from limiting distribution of standardized random variable, we now consider distribution of some function of random variable. Delta method is such a method of approximating the mean and variance of a random variable, based on using a Taylor series approximation.

Definition 3.1 (Taylor Polynomial) If a function

$g(x)$ has derivatives of order r, that is

$g^{(r)}(x)=\frac{d^r}{dx^r}g(x)$ exists, then for any constant a, the Taylor polynomial of order r about a is

$\begin{equation} T_r(x)=\sum_{i=0}^{r}g^{(r)}(a)\frac{(x-a)^i}{i!} \tag{3.1} \end{equation}$

Taylor’s major theorem states that the remainder from the approximation $g(x)-T_r(x)$ always tends to 0 faster than the highest-order explicit term. That is $\begin{equation} \lim_{x\to a}\frac{g(x)-T_r(x)}{(x-a)^r}=0 \tag{3.2} \end{equation}$

For multivariate case, let $T_1,\cdots,T_k$ be random variables with means $\theta_1,\cdots,\theta_k$ and define $\mathbf{T}=(T_1,\cdots,T_k)$ and $\mathbf{\theta}=(\theta_1,\cdots,\theta_k)$ . Suppose there is a differentiable function $g(\mathbf{T})$ for which we want an approximate estimate of variance. Define $\begin{equation} g'_i(\mathbf{\theta})=\frac{\partial}{\partial t_i}g(\mathbf{t})|_{t_1=\theta_1,\cdots,t_k=\theta_k} \tag{3.3} \end{equation}$ The first-order Taylor series expansion of g about $\mathbf{\theta}$ is $\begin{equation} g(\mathbf{t})=g(\mathbf{\theta})+\sum_{i=1}^kg'_i(\mathbf{\theta})(t_i-\theta_i)+Remainder \tag{3.4} \end{equation}$ For statistical approximation, the remainder can be ignored and we end up with $\begin{equation} g(\mathbf{t})\approx g(\mathbf{\theta})+\sum_{i=1}^kg'_i(\mathbf{\theta})(t_i-\theta_i) \tag{3.5} \end{equation}$ Now take expectation on both sides of (3.5) $\begin{equation} E_{\mathbf{\theta}}g(\mathbf{\theta})\approx g(\mathbf{T})+\sum_{i=1}^kg'_i(\mathbf{\theta})E_{\mathbf{T}}(T_i-\theta_i)=g(\mathbf{\theta}) \tag{3.6} \end{equation}$ Now approximate the variance of $g(\mathbf{T})$ by $\begin{equation} \begin{split} Var_{\mathbf{T}}(g(\mathbf{T}))&\approx E_{\mathbf{T}}([g(\mathbf{T})-g(\mathbf{\theta})]^2)\\ &\approx E_{\mathbf{T}}((\sum_{i=1}^kg'_i(\mathbf{\theta})(T_i-\theta_i))^2)\\ &=\sum_{i=1}^k[g'_i(\mathbf{\theta})]^2Var_{\mathbf{T}}(T_i)+2\sum_{i>j}g'_i(\mathbf{\theta})g'_j(\mathbf{\theta})Cov_{\mathbf{T}}(T_i,T_j) \end{split} \tag{3.7} \end{equation}$

Theorem 3.1 (Delta Method) Let

$Y_n$ be a sequence of random variables that satisfies

$\sqrt{n}(Y_n-\theta)\to N(0,\sigma^2)$ in distribution. For a given function g and a specific value of

$\theta$ , suppose that

$g^{\prime}(\theta)$ exists and is not 0. Then

$\begin{equation} \sqrt{n}[g(Y_n)-g(\theta)]\to N(0,\sigma^2[g^{\prime}(\theta)]^2) \tag{3.8} \end{equation}$ in distribution.

Proof. The Taylor expansion of

$g(Y_n)$ around

$Y_n=\theta$ is

$\begin{equation} g(Y_n)=g(\theta)+g^{\prime}(\theta)(Y_n-\theta)+Remainder \tag{3.9} \end{equation}$ where the remainder

$\to0$ as

$Y_n\to\theta$ . Now we first show from

$\sqrt{n}(Y_n-\theta)\to N(0,\sigma^2)$ in distribution we get

$Y_n\to\theta$ in probability. Since

$P(|Y_n-\theta|<\epsilon)=P(|\sqrt{n}(Y_n-\theta)|<\sqrt{n}\epsilon)$ , we have

$\begin{equation} \lim_{n\to\infty}P(|Y_n-\theta|<\epsilon)=\lim_{n\to\infty}P(|\sqrt{n}(Y_n-\theta)|<\sqrt{n}\epsilon)=P(|Z|<\infty)=1 \tag{3.10} \end{equation}$ where

$Z\sim N(0,\sigma^2)$ . Thus, we have the remainder converges to 0 in probability. Then by applying part (b) of Slutsky Theorem (Theorem 2.9) we get the result.

Example 3.1 (Approximate mean and variance) Suppose

$X$ is a random variable with

$EX=\mu\neq0$ . If we want to estimate a function

$g(\mu)$ , a first-order approximation would give us

$g(X)=g(\mu)+g^{\prime}(\mu)(X-\mu)$ . If we use

$g(X)$ as an estimator of

$g(\mu)$ , then approximately we have

$\begin{align} &Eg(X)\approx g(\mu) \tag{3.11}\\ &Var(g(X))\approx [g^{\prime}(\mu)]^2Var(X) \tag{3.12}\\ \end{align}$ For a specific example, take

$g(\mu)=1/\mu$ , we estimate

$1/\mu$ with

$1/X$ which approximately has mean and variance

$\begin{align} &E\frac{1}{X}\approx \frac{1}{\mu} \tag{3.13}\\ &Var(\frac{1}{X})\approx [\frac{1}{\mu^4}]Var(X) \tag{3.14}\\ \end{align}$ Suppose now we have the mean of a random sample

$\bar{X}$ . For

$\mu\neq0$ , by Theorem 3.1

$\sqrt{n}(\frac{1}{\bar{X}}-\frac{1}{\mu})\to N(0,\frac{1}{\mu^4}Var(X_1))$ in distribution. Usually

$\mu$ and

$Var(X_1)$ are both unknown, which can be estimated by

$\bar{X}$ and

$S^2$ respectively. Thus, we have the approximate variance

$\hat{Var}(\frac{1}{\bar{X}})\approx (\frac{1}{\bar{X}^4})S^2$ . And since

$\bar{X}$ and

$S^2$ are consistent estimator, apply Slutskey theorem again and conclude that for

$\mu\neq 0$ ,

$\begin{equation} \frac{\sqrt{n}(\frac{1}{\bar{X}}-\frac{1}{\mu})}{(\frac{1}{X})^2S}\to N(0,1) \tag{3.15} \end{equation}$ in distribution.

There are two extensions of basic Delta Method. The first one is dealing with the case when $g^{\prime}(\theta)=0$ . If this happens, we take one more term in Taylor expansion to get $\begin{equation} g(Y_n)=g(\theta)+g^{\prime}(\theta)(Y_n-\theta)+\frac{g^{\prime\prime}(\theta)}{2}(Y_n-\theta)^2+Remainder \tag{3.16} \end{equation}$ Rearranging terms we get $\begin{equation} g(Y_n)-g(\theta)=\frac{g^{\prime\prime}(\theta)}{2}(Y_n-\theta)^2+Remainder \tag{3.17} \end{equation}$ and notice that the square of a $N(0,1)$ r.v. is a $\chi_1^2$ , which implies $\frac{n(Y_n-\mu)^2}{\sigma^2}\to\chi_1^2$ in distribution. Therefore, we have the first extension of Delta Method.

Theorem 3.2 (Second-order Delta Method) Let

$Y_n$ be a sequence of random variables that satisfies

$\sqrt{n}(Y_n-\theta)\to N(0,\sigma^2)$ in distribution. For a given function g and a specific value of

$\theta$ , suppose that

$g^{\prime}(\theta)=0$ ,

$g^{\prime\prime}(\theta)$ exists and is not 0. Then

$\begin{equation} n[g(Y_n)-g(\theta)]\to \sigma^2\frac{g^{\prime\prime}(\theta)}{2}\chi_1^2 \tag{3.18} \end{equation}$ in distribution.

The second extension deals with mulvariate case. First consider the following example dealing with moments of a ratio estimator.

Example 3.2 Suppose X and Y are random variables with nonzero mean $\mu_X$ and $\mu_Y$ , respectively. The parametrix function to be estimated is $g(\mu_X,\mu_Y)=\frac{\mu_X}{\mu_Y}$ . It is straightforward to calculate $\frac{\partial}{\partial\mu_X}=\frac{1}{\mu_Y}$ and $\frac{\partial}{\partial\mu_Y}=\frac{-\mu_X}{\mu^2_Y}$ . Then by (3.6) and (3.7) we have $\begin{equation} E(\frac{X}{Y})\approx \frac{\mu_X}{\mu_Y} \tag{3.19} \end{equation}$ and $\begin{equation} \begin{split} Var(\frac{X}{Y})&\approx \frac{1}{\mu_Y^2}Var(X)+\frac{\mu_X^2}{\mu_Y^4}Var(Y)-2\frac{\mu_X}{\mu_Y^3}Cov(X,Y)\\ &=(\frac{\mu_X}{\mu_Y})^2(\frac{Var(X)}{\mu_X^2}+\frac{Var(Y)}{\mu_Y^2}-2\frac{Cov(X,Y)}{\mu_X\mu_Y}) \end{split} \tag{3.20} \end{equation}$

Thus, we obtain an approximation which use only the means, variance and covariance of

$X$ and

$Y$ , while the exact calculation is hopeless.

Inspired by the demand of estimation using more than one variable to a function containing more than one parameter, we have the following multivariate case Delta Method.

Theorem 3.3 (Multivariate Delta Method) Let

$\mathbf{X}_1,\cdots,\mathbf{X}_n$ be a random sample with

$E(X_{ij})=\mu_i$ and

$Cov(X_{ik},X_{jk})=\sigma_{ij}$ . For a given function g with continuous first partial derivatives and a specific value of

$\mathbf{\mu}=(\mu_1,\cdots,\mu_p)$ for which

$\tau^2=\sum\sum\sigma_{ij}\frac{\partial g(\mu)}{\partial\mu_i}\cdot\frac{\partial g(\mu)}{\partial\mu_j}>0$ ,

$\begin{equation} \sqrt{n}[g(\bar{X}_1,\cdots,\bar{X}_s)-g(\mu_1,\cdots,\mu_p)]\to N(0,\tau^2) \tag{3.21} \end{equation}$
in distribution.

A sufficient statistic for a parameter $\theta$ is a statistic that, in a certain sense, captures all the information about $\theta$ contained in the sample. Any additional information in the sample, besides the value of the sufficient statistic, does not contain any more information about $\theta$ .

Definition 3.2 (SUFFICIENCY PRINCIPLE) If

$T(\mathbf{X})$ is a sufficient statistic of

$\theta$ , then any inference about

$\theta$ should depend on the sample

$\mathbf{X}$ only through the value

$T(\mathbf{X})$ . That is, if

$\mathbf{x}$ and

$\mathbf{y}$ are two sample points such that

$T(\mathbf{x})=T(\mathbf{y})$ , then the inference about

$\theta$ should be the same whether

$\mathbf{X}=\mathbf{x}$ or

$\mathbf{X}=\mathbf{y}$ is observed.

Definition 3.3 (Sufficient Statistic) A statistic

$T(\mathbf{X})$ is a sufficient statistic for

$\theta$ if the conditional distribution of the sample

$\mathbf{X}$ given the value of

$T(\mathbf{X})$ dose not depend on

$\theta$ .

To use Definition 3.3 to verify that a statistic $T(\mathbf{X})$ is a sufficient statistic for $\theta$ , we must verify for any fixed $\mathbf{x}$ and $t$ , the conditional probability $P_{\theta}(\mathbf{X}=\mathbf{x}|T(\mathbf{X})=t)$ is the same for all valuse of $\theta$ . (Here $P_{\theta}$ means $\theta$ is a parameter in this distribution.) Since if $T(\mathbf{x})\neq t$ , this probability is 0 regardless of $\theta$ . We only need to verify $P_{\theta}(\mathbf{X}=\mathbf{x}|T(\mathbf{X})=T(\mathbf{x}))$ does not depend on $\theta$ . Since $S_1:=\{s\in S: \mathbf{X}(s)=\mathbf{x}\}$ is a subset of $S_2:=\{s\in S: T(\mathbf{X}(s))=T(\mathbf{x})\}$ , $\begin{equation} \begin{split} P_{\theta}(\mathbf{X}=\mathbf{x}|T(\mathbf{X})=T(\mathbf{x}))&=\frac{P_{\theta} (\mathbf{X}=\mathbf{x}\, and \, T(\mathbf{X})=T(\mathbf{x}))}{P_{\theta}(T(\mathbf{X})=T(\mathbf{x}))}\\ &=\frac{P_{\theta}(\mathbf{X}=\mathbf{x})}{P_{\theta}(T(\mathbf{X})=T(\mathbf{x}))}\\ &=\frac{p(\mathbf{x}|\theta)}{q(T(\mathbf{x})|\theta)} \end{split} \tag{3.22} \end{equation}$ Here $p(\mathbf{x}|\theta)$ is the joint pmf of the sample $\mathbf{X}$ and $q(t|\theta)$ is the pmf of $T(\mathbf{X})$ . Thus, $T(\mathbf{X})$ is a sufficient statistic for $\theta$ iff $\forall \mathbf{x}$ , the above ratio of pmfs is constant as a function of $\theta$ .

Theorem 3.4

$p(\mathbf{x}|\theta)$ is the joint pdf of the sample

$\mathbf{X}$ and

$q(t|\theta)$ is the pdf or pmf of

$T(\mathbf{X})$ , then

$T(\mathbf{X})$ is a sufficient statistic for

$\theta$ if

$\forall \mathbf{x}$ in the sample space, the ratio

$\frac{p(\mathbf{x}|\theta)}{q(T(\mathbf{x})|\theta)}$ is constant as a function of

$\theta$ .

Example 3.3 (Normal Sufficient Statistic) Let

$X_1,\cdots,X_n$ be i.i.d.

$N(\mu,\sigma^2)$ where

$\sigma^2$ is known. We wish to show that sample mean

$T(\mathbf{X})=\bar{X}$ is a sufficient statistic for

$\mu$ . The joint pdf of sample

$\mathbf{X}$ is

$\begin{equation} \begin{split} f(\mathbf{x}|\mu)&=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-\sum_{i=1}^n(x_i-\mu)^2/(2\sigma^2))\\ &=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-\sum_{i=1}^n(x_i-\bar{x}+\bar{x}-\mu)^2/(2\sigma^2))\\ &=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-(\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\mu)^2)/(2\sigma^2)) \end{split} \tag{3.23} \end{equation}$ Since

$\bar{X}\sim N(\mu,\sigma^2/n)$ , the ratio of pdf is

$\begin{equation} \begin{split} \frac{p(\mathbf{x}|\theta)}{q(T(\mathbf{x})|\theta)}&=\frac{(2\pi\sigma^2)^{-n/2}exp(-(\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\mu)^2)/(2\sigma^2))}{(2\pi\sigma^2/n)^{-1/2}exp(-n(\bar{x}-\mu)^2/(2\sigma^2))}\\ &=n^{-1/2}(2\pi\sigma^2)^{-(n-1)/2}exp(-\sum_{i=1}^n(x_i-\bar{x})^2/(2\sigma^2)) \end{split} \tag{3.24} \end{equation}$ which does not depend on

$\mu$ . Hence by Theorem 3.4, the sample mean is a sufficient statistic for porpulation mean

$\mu$ .