1.6 Nonparametric inference
The aim of statistical inference is to use data to infer an unknown quantity. In the game of inference, there is usually a trade-off between efficiency and generality, and this trade-off is controlled by the strength of assumptions that are made on the data generating process.
Parametric inference opts for favoring efficiency. Given a model (a strong assumption on the data generating process), it provides a set of inferential methods (point estimation, confidence intervals, hypothesis testing, etc). All of them are the most efficient inferential procedures if the model matches the reality, in other words, if the data generating process truly satisfies the assumptions. Otherwise the methods may be inconsistent.
On the other hand, nonparametric inference opts for favoring generality. Given a set of minimal and weak assumptions (e.g. certain smoothness of a density), it provides inferential methods that are consistent for broad situations, in exchange of losing efficiency for small or moderate sample sizes.
Hence, for any specific data generation process there is a parametric method that dominates in efficiency its nonparametric counterpart. But knowledge of the data generation process is rarely the case in practice. That is the appealing of a nonparametric method: it will perform adequately no matter what the data generation process is. For that reason, nonparametric methods are useful\(\ldots\)
- \(\ldots\) when we have no clue on what could be a good parametric model.
- \(\ldots\) for creating goodness-of-fit tests employed for validating parametric models.
Example 1.6 Assume we have a sample \(X_1,\ldots,X_n\) from a random variable \(X\) and we want to estimate its distribution function \(F.\) Without any assumption, we know that the ecdf in (1.1) is an estimate for \(F(x)=\mathbb{P}[X\leq x].\) It is indeed a nonparametric estimate for \(F.\) Its expectation and variance are
\[\begin{align*} \mathbb{E}[F_n(x)]=F(x),\quad \mathbb{V}\mathrm{ar}[F_n(x)]=\frac{F(x)(1-F(x))}{n}. \end{align*}\]
From the squared bias and variance, we can get the MSE:
\[\begin{align*} \mathrm{MSE}[F_n(x)]=\frac{F(x)(1-F(x))}{n}. \end{align*}\]
Assume now that \(X\sim\mathrm{Exp}(\lambda).\) By maximum likelihood, it is possible to estimate \(\lambda\) as \(\hat \lambda_\mathrm{ML}={\bar X}^{-1}.\) Then, we have the following estimate for \(F(x)\):
\[\begin{align} F(x;\hat\lambda_\mathrm{ML})=1-e^{-\hat\lambda_\mathrm{ML} x}. \tag{1.2} \end{align}\]
Is not so simple to obtain the exact MSE for (1.2), even if it is easy to prove that \(\hat\lambda_\mathrm{ML}\sim \mathrm{IG}(\lambda^{-1},n).\) Approximations are possible using Exercise 1.2. However, the MSE can be easily approximated by Monte Carlo.
What does it happen when the data is generated from an \(\mathrm{Exp}(\lambda)\)? Then (1.2) uniformly dominates (1.1) in performance. But, even for small deviations from \(\mathrm{Exp}(\lambda)\) given by \(\Gamma(\lambda, p),\) \(p=1+\delta,\) (1.2) shows major problems in terms of bias, while (1.1) remains with the same performance. The animation in Figure 1.1 illustrates precisely this behavior.