## 1.6 Nonparametric inference

The aim of statistical inference is to use data to *infer* an unknown quantity. In the game of inference, there is usually a trade-off between *efficiency* and *generality*, and this trade-off is controlled by the strength of assumptions that are made on the data generating process.

Parametric inference opts for favoring **efficiency**. Given a model (a strong assumption on the data generating process), it provides a set of inferential methods (point estimation, confidence intervals, hypothesis testing, etc). All of them are the most efficient inferential procedures ** if** the model matches the reality, in other words,

**the data generating process truly satisfies the assumptions. Otherwise the methods may be inconsistent.**

*if*On the other hand, nonparametric inference opts for favoring **generality**. Given a set of *minimal and weak* assumptions (e.g. certain smoothness of a density), it provides inferential methods that are consistent for broad situations, in exchange of losing efficiency for small or moderate sample sizes.

Hence, for any specific data generation process there is a parametric method that dominates in efficiency its nonparametric counterpart. But knowledge of the data generation process is rarely the case in practice. That is the appealing of a nonparametric method: it will **perform adequately no matter what the data generation process is**. For that reason, nonparametric methods are useful\(\ldots\)

- \(\ldots\) when we have no clue on what could be a good parametric model.
- \(\ldots\) for creating goodness-of-fit tests employed for validating parametric models.

**Example 1.6 **Assume we have a sample \(X_1,\ldots,X_n\) from a random variable \(X\) and we want to estimate its distribution function \(F\). Without any assumption, we know that the ecdf in (1.1) is an estimate for \(F(x)=\mathbb{P}[X\leq x]\). It is indeed a *nonparametric estimate* for \(F\). Its expectation and variance are

\[\begin{align*} \mathbb{E}[F_n(x)]=F(x),\quad \mathbb{V}\mathrm{ar}[F_n(x)]=\frac{F(x)(1-F(x))}{n}. \end{align*}\]

From the squared bias and variance, we can get the MSE:

\[\begin{align*} \mathrm{MSE}[F_n(x)]=\frac{F(x)(1-F(x))}{n}. \end{align*}\]

Assume now that \(X\sim\mathrm{Exp}(\lambda)\). By maximum likelihood, it is possible to estimate \(\lambda\) as \(\hat \lambda_\mathrm{ML}={\bar X}^{-1}\). Then, we have the following estimate for \(F(x)\):

\[\begin{align} F(x;\hat\lambda_\mathrm{ML})=1-e^{-\hat\lambda_\mathrm{ML} x}. \tag{1.2} \end{align}\]

Is not so simple to obtain the exact MSE for (1.2), even if it is easy to prove that \(\hat\lambda_\mathrm{ML}\sim \mathrm{IG}(\lambda^{-1},n)\). Approximations are possible using Exercise 1.2. However, the MSE can be easily approximated by Monte Carlo.

What does it happen when the data is generated from an \(\mathrm{Exp}(\lambda)\)? Then (1.2) uniformly dominates (1.1) in performance. But, even for small deviations from \(\mathrm{Exp}(\lambda)\) given by \(\Gamma(\lambda, p)\), \(p=1+\delta\), (1.2) shows major problems in terms of bias, while (1.1) remains with the same performance. The animation in Figure 1.1 illustrates precisely this behavior.