7.3 Asymptotic Properties of Estimators

Estimator bias and precision are finite sample properties. That is, they are properties that hold for a fixed sample size \(T\). Very often we are also interested in properties of estimators when the sample size \(T\) gets very large. For example, analytic calculations may show that the bias and mse of an estimator \(\hat{\theta}\) depend on \(T\) in a decreasing way. That is, as \(T\) gets very large the bias and mse approach zero. So for a very large sample, \(\hat{\theta}\) is effectively unbiased with high precision. In this case we say that \(\hat{\theta}\) is a consistent estimator of \(\theta\). In addition, for large samples the Central Limit Theorem (CLT) says that \(f(\hat{\theta})\) can often be well approximated by a normal distribution. In this case, we say that \(\hat{\theta}\) is asymptotically normally distributed. The word “asymptotic” means" in an infinitely large sample" or “as the sample size \(T\) goes to infinity” . Of course, in the real world we don’t have an infinitely large sample and so the asymptotic results are only approximations. How good these approximations are for a given sample size \(T\) depends on the context. Monte Carlo simulations (see section 7.6) can often be used to evaluate asymptotic approximations in a given context.

7.3.1 Consistency

Let \(\hat{\theta}\) be an estimator of \(\theta\) based on the random returns \(\{R_{t}\}_{t=1}^{T}\).

Definition 7.4 \(\hat{\theta}\) is consistent for \(\theta\) (converges in probability to \(\theta\)) if for any \(\varepsilon>0\):

\[ \lim_{T\rightarrow\infty}\Pr(|\hat{\theta}-\theta|>\varepsilon)=0. \]

Intuitively, consistency says that as we get enough data then \(\hat{\theta}\) will eventually equal \(\theta\). In other words, if we have enough data then we know the truth.

Theorems in probability theory known as Laws of Large Numbers are used to determine if an estimator is consistent or not. In general, we have the following result.

Proposition 7.1 An estimator \(\hat{\theta}\) is consistent for \(\theta\) if:

  • \(\mathrm{bias}(\hat{\theta},\theta)=0\) as \(T\rightarrow\infty\).
  • \(\mathrm{se}(\hat{\theta})=0\) as \(T\rightarrow\infty\).

Equivalently, \(\hat{\theta}\) is consistent for \(\theta\) if \(\mathrm{mse}(\hat{\theta},\theta)\rightarrow0\) as \(T\rightarrow\infty\).

Intuitively, if \(f(\hat{\theta})\) collapses to \(\theta\) as \(T\rightarrow\infty\) then \(\hat{\theta}\) is consistent for \(\theta\).

7.3.2 Asymptotic Normality

Let \(\hat{\theta}\) be an estimator of \(\theta\) based on the random returns \(\{R_{t}\}_{t=1}^{T}\).

Definition 2.7 An estimator \(\hat{\theta}\) is asymptotically normally distributed if: \[\begin{equation} \hat{\theta}\sim N(\theta,\mathrm{se}(\hat{\theta})^{2})\tag{7.7} \end{equation}\] for large enough \(T\).

Asymptotic normality means that \(f(\hat{\theta})\) is well approximated by a normal distribution with mean \(\theta\) and variance \(\mathrm{se}(\hat{\theta})^{2}\). The justification for asymptotic normality comes from the famous Central Limit Theorem.

In the definition of an asymptotically normal estimator, the variance of the normal distribution, \(\mathrm{se}(\hat{\theta})^{2}\), often depends on unknown GWN model parameters and so is practically useless. Fortunately, we can create a practically useful result if we replace the unknown parameters in \(\mathrm{se}(\hat{\theta})^{2}\) with consistent estimates.

Proposition 7.2 (Practically useful asymptotic normality) Supoose an estimator \(\hat{\theta}\) is asymptotically normally distributed with variance \(\mathrm{se}(\hat{\theta})^{2}\) that depends on unknown GWN model parameters. If we replace these unknown parameters with consistent estimates, creating the estimated variance \(\widehat{\mathrm{se}}(\hat{\theta})^{2}\), then \[\begin{equation} \hat{\theta}\sim N(\theta,\widehat{\mathrm{se}}(\hat{\theta})^{2})\tag{7.8} \end{equation}\] for large enough \(T\).

7.3.3 Central Limit Theorem

There are actually many versions of the CLT with different assumptions.35 In its simplest form, the CLT says that the sample average of a collection of iid random variables \(X_{1},\ldots,X_{T}\) with \(E[X_{i}]=\mu\) and \(var(X_{i})=\sigma^{2}\) is asymptotically normal with mean \(\mu\) and variance \(\sigma^{2}/T\). In particular, the CDF of the standardized sample mean: \[ \frac{\bar{X}-\mu}{\mathrm{se}(\bar{X})}=\frac{\bar{X}-\mu}{\sigma/\sqrt{T}}=\sqrt{T}\left(\frac{\bar{X}-\mu}{\sigma}\right), \] converges to the CDF of a standard normal random variable \(Z\) as \(T\rightarrow\infty\). This result can be expressed as: \[ \sqrt{T}\left(\frac{\bar{X}-\mu}{\sigma}\right)\sim Z\sim N(0,1), \] for large enough \(T\). Equivalently, \[ \bar{X}\sim\mu+\frac{\sigma}{\sqrt{T}}\times Z\sim N\left(\mu,\frac{\sigma^{2}}{T}\right)=N\left(\mu,\mathrm{se}(\bar{X})^{2}\right), \] for large enough \(T\). This form shows that \(\bar{X}\) is asymptotically normal with mean \(\mu\) and variance \(\mathrm{se}(\bar{X})^{2} = \sigma^{2}/T\).

We make the following remarks.

  • The CLT result is truly remarkable because the iid rvs \(X_i\) can come from any distribution that has mean \(\mu\) and variance \(\sigma^2\). For example, \(X_i\) could be binomial, Student’s t with df > 2, chi-square, etc.

  • While the basic CLT shows that the sample average \(\bar{X}\) is asymptotically normally distributed, it can be extended to general estimators of parameters of the GWN model because, as we shall see, these estimators are essentially averages of iid random variables.

  • The iid assumption of the basic CLT can be relaxed to allow \(\{X_i\}\) to be covariance stationary and ergodic. This allows the asymptotic normality result to be extended to estimators computed from time series data that could be serially correlated.

  • The CLT result depends on knowing \(\sigma^2\), which is practically useless because we almost never know \(\sigma^2\). However, the CLT result holds if we replace \(\sigma^2\) with a consistent estimate \(\hat{\sigma}^2\).

7.3.4 Asymptotic Confidence Intervals

For an asymptotically normal estimator \(\hat{\theta}\) of \(\theta\), the precision of \(\hat{\theta}\) is measured by \(\widehat{\mathrm{se}}(\hat{\theta})\) but is best communicated by computing a (asymptotic) confidence interval for the unknown value of \(\theta\). A confidence interval is an interval estimate of \(\theta\) such that we can put an explicit probability statement about the likelihood that the interval covers \(\theta\).

The construction of an asymptotic confidence interval for \(\theta\) uses the asymptotic normality result: \[\begin{equation} \frac{\hat{\theta}-\theta}{\widehat{\mathrm{se}}(\hat{\theta})}=Z\sim N(0,1).\tag{7.9} \end{equation}\] Then, for \(\alpha\in(0,1)\), we compute a \((1-\alpha)\cdot100\%\) confidence interval for \(\theta\) using (7.9) and the \(1-\alpha/2\) standard normal quantile (critical value) \(q_{(1-\alpha/2)}^{Z}\) to give: \[ \Pr\left(-q_{(1-\alpha/2)}^{Z}\leq\frac{\hat{\theta}-\theta}{\widehat{\mathrm{se}}(\hat{\theta})}\leq q_{(1-\alpha/2)}^{Z}\right)=1-\alpha, \] which can be rearranged as, \[ \Pr\left(\hat{\theta}-q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})\leq\mu_{i}\leq\hat{\theta}+q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})\right)=1-\alpha. \] Hence, the random interval, \[\begin{equation} [\hat{\theta}-q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta}),~\hat{\theta}+q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})]=\hat{\theta}\pm q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})\tag{7.10} \end{equation}\] covers the true unknown value of \(\theta\) with probability \(1-\alpha\).

In practice, typical values for \(\alpha\) are 0.05 and 0.01 for which \(q_{(0.975)}^{Z}=1.96\) and \(q_{(0.995)}^{Z}=2.58\). Then, approximate 95% and 99% asymptotic confidence intervals for \(\theta\) have the form \(\hat{\theta}\pm2\cdot\widehat{\mathrm{se}}(\hat{\theta})\) and \(\hat{\theta}\pm2.5\cdot\widehat{\mathrm{se}}(\hat{\theta})\), respectively.


  1. White (1984) gives a comprehensive discussion of CLTs useful in econometrics.↩︎