## 7.3 Asymptotic Properties of Estimators

Estimator bias and precision are finite sample properties. That is,
they are properties that hold for a fixed sample size \(T\). Very often
we are also interested in properties of estimators when the sample
size \(T\) gets very large. For example, analytic calculations may
show that the bias and mse of an estimator \(\hat{\theta}\) depend
on \(T\) in a decreasing way. That is, as \(T\) gets very large the
bias and mse approach zero. So for a very large sample, \(\hat{\theta}\)
is effectively unbiased with high precision. In this case we say that
\(\hat{\theta}\) is a *consistent* estimator of \(\theta\). In
addition, for large samples the *Central Limit Theorem* (CLT) says that
\(f(\hat{\theta})\) can often be well approximated by a normal distribution.
In this case, we say that \(\hat{\theta}\) is *asymptotically normally distributed*. The word “asymptotic” means" in an infinitely large sample" or “as the sample
size \(T\) goes to infinity” . Of course, in the real
world we don’t have an infinitely large sample and so the asymptotic
results are only approximations. How good these approximations are
for a given sample size \(T\) depends on the context. Monte Carlo simulations
(see section 7.6) can often be
used to evaluate asymptotic approximations in a given context.

### 7.3.1 Consistency

Let \(\hat{\theta}\) be an estimator of \(\theta\) based on the random returns \(\{R_{t}\}_{t=1}^{T}\).

**Definition 7.4 **\(\hat{\theta}\) is consistent for \(\theta\) (converges
in probability to \(\theta\)) if for any \(\varepsilon>0\):

\[ \lim_{T\rightarrow\infty}\Pr(|\hat{\theta}-\theta|>\varepsilon)=0. \]

Intuitively, consistency says that as we get enough data then \(\hat{\theta}\) will eventually equal \(\theta\). In other words, if we have enough data then we know the truth.

Theorems in probability theory known as *Laws of Large Numbers*
are used to determine if an estimator is consistent or not. In general,
we have the following result.

**Proposition 7.1 **
An estimator \(\hat{\theta}\) is consistent for \(\theta\) if:

- \(\mathrm{bias}(\hat{\theta},\theta)=0\) as \(T\rightarrow\infty\).
- \(\mathrm{se}(\hat{\theta})=0\) as \(T\rightarrow\infty\).

Equivalently, \(\hat{\theta}\) is consistent for \(\theta\) if \(\mathrm{mse}(\hat{\theta},\theta)\rightarrow0\) as \(T\rightarrow\infty\).

Intuitively, if \(f(\hat{\theta})\) collapses to \(\theta\) as \(T\rightarrow\infty\) then \(\hat{\theta}\) is consistent for \(\theta\).

### 7.3.2 Asymptotic Normality

Let \(\hat{\theta}\) be an estimator of \(\theta\) based on the random returns \(\{R_{t}\}_{t=1}^{T}\).

**Definition 2.7 **An estimator \(\hat{\theta}\) is *asymptotically normally distributed* if:
\[\begin{equation}
\hat{\theta}\sim N(\theta,\mathrm{se}(\hat{\theta})^{2})\tag{7.7}
\end{equation}\]
for large enough \(T\).

Asymptotic normality means that \(f(\hat{\theta})\) is well approximated
by a normal distribution with mean \(\theta\) and variance \(\mathrm{se}(\hat{\theta})^{2}\).
The justification for asymptotic normality comes from the famous *Central Limit Theorem*.

In the definition of an asymptotically normal estimator, the variance of the normal distribution, \(\mathrm{se}(\hat{\theta})^{2}\), often depends on unknown GWN model parameters and so is practically useless. Fortunately, we can create a practically useful result if we replace the unknown parameters in \(\mathrm{se}(\hat{\theta})^{2}\) with consistent estimates.

**Proposition 7.2 (Practically useful asymptotic normality)**Supoose an estimator \(\hat{\theta}\) is asymptotically normally distributed with variance \(\mathrm{se}(\hat{\theta})^{2}\) that depends on unknown GWN model parameters. If we replace these unknown parameters with consistent estimates, creating the estimated variance \(\widehat{\mathrm{se}}(\hat{\theta})^{2}\), then \[\begin{equation} \hat{\theta}\sim N(\theta,\widehat{\mathrm{se}}(\hat{\theta})^{2})\tag{7.8} \end{equation}\] for large enough \(T\).

### 7.3.3 Central Limit Theorem

There are actually many versions of the CLT with different assumptions.^{35}
In its simplest form, the CLT says that the sample average of a collection
of iid random variables \(X_{1},\ldots,X_{T}\) with \(E[X_{i}]=\mu\)
and \(var(X_{i})=\sigma^{2}\) is asymptotically normal with mean \(\mu\)
and variance \(\sigma^{2}/T\). In particular, the CDF of the standardized
sample mean:
\[
\frac{\bar{X}-\mu}{\mathrm{se}(\bar{X})}=\frac{\bar{X}-\mu}{\sigma/\sqrt{T}}=\sqrt{T}\left(\frac{\bar{X}-\mu}{\sigma}\right),
\]
converges to the CDF of a standard normal random variable \(Z\) as
\(T\rightarrow\infty\). This result can be expressed as:
\[
\sqrt{T}\left(\frac{\bar{X}-\mu}{\sigma}\right)\sim Z\sim N(0,1),
\]
for large enough \(T\). Equivalently,
\[
\bar{X}\sim\mu+\frac{\sigma}{\sqrt{T}}\times Z\sim N\left(\mu,\frac{\sigma^{2}}{T}\right)=N\left(\mu,\mathrm{se}(\bar{X})^{2}\right),
\]
for large enough \(T\). This form shows that \(\bar{X}\) is asymptotically
normal with mean \(\mu\) and variance \(\mathrm{se}(\bar{X})^{2} = \sigma^{2}/T\).

We make the following remarks.

The CLT result is truly remarkable because the iid rvs \(X_i\) can come from

*any distribution*that has mean \(\mu\) and variance \(\sigma^2\). For example, \(X_i\) could be binomial, Student’s t with df > 2, chi-square, etc.While the basic CLT shows that the sample average \(\bar{X}\) is asymptotically normally distributed, it can be extended to general estimators of parameters of the GWN model because, as we shall see, these estimators are essentially averages of iid random variables.

The iid assumption of the basic CLT can be relaxed to allow \(\{X_i\}\) to be covariance stationary and ergodic. This allows the asymptotic normality result to be extended to estimators computed from time series data that could be serially correlated.

The CLT result depends on knowing \(\sigma^2\), which is practically useless because we almost never know \(\sigma^2\). However, the CLT result holds if we replace \(\sigma^2\) with a consistent estimate \(\hat{\sigma}^2\).

### 7.3.4 Asymptotic Confidence Intervals

For an asymptotically normal estimator \(\hat{\theta}\) of \(\theta\),
the precision of \(\hat{\theta}\) is measured by \(\widehat{\mathrm{se}}(\hat{\theta})\)
but is best communicated by computing a (asymptotic) *confidence
interval* for the unknown value of \(\theta\). A confidence interval
is an *interval estimate* of \(\theta\) such that we can put an
explicit probability statement about the likelihood that the interval
covers \(\theta\).

The construction of an asymptotic confidence interval for \(\theta\) uses the asymptotic normality result: \[\begin{equation} \frac{\hat{\theta}-\theta}{\widehat{\mathrm{se}}(\hat{\theta})}=Z\sim N(0,1).\tag{7.9} \end{equation}\] Then, for \(\alpha\in(0,1)\), we compute a \((1-\alpha)\cdot100\%\) confidence interval for \(\theta\) using (7.9) and the \(1-\alpha/2\) standard normal quantile (critical value) \(q_{(1-\alpha/2)}^{Z}\) to give: \[ \Pr\left(-q_{(1-\alpha/2)}^{Z}\leq\frac{\hat{\theta}-\theta}{\widehat{\mathrm{se}}(\hat{\theta})}\leq q_{(1-\alpha/2)}^{Z}\right)=1-\alpha, \] which can be rearranged as, \[ \Pr\left(\hat{\theta}-q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})\leq\mu_{i}\leq\hat{\theta}+q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})\right)=1-\alpha. \] Hence, the random interval, \[\begin{equation} [\hat{\theta}-q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta}),~\hat{\theta}+q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})]=\hat{\theta}\pm q_{(1-\alpha/2)}^{Z}\cdot\widehat{\mathrm{se}}(\hat{\theta})\tag{7.10} \end{equation}\] covers the true unknown value of \(\theta\) with probability \(1-\alpha\).

In practice, typical values for \(\alpha\) are 0.05 and 0.01 for which \(q_{(0.975)}^{Z}=1.96\) and \(q_{(0.995)}^{Z}=2.58\). Then, approximate 95% and 99% asymptotic confidence intervals for \(\theta\) have the form \(\hat{\theta}\pm2\cdot\widehat{\mathrm{se}}(\hat{\theta})\) and \(\hat{\theta}\pm2.5\cdot\widehat{\mathrm{se}}(\hat{\theta})\), respectively.

White (1984) gives a comprehensive discussion of CLTs useful in econometrics.↩︎