1.6 Why Nonparametric Statistics?

The aim of statistical inference is to use data to infer an unknown quantity. In the game of inference, there is usually a trade-off between efficiency and generality, and this trade-off is controlled by the strength of assumptions that are made on the data generating process.

Parametric inference favors efficiency. Given a model (a strong assumption on the data generating process), parametric inference delivers a set of methods (point estimation, confidence intervals, hypothesis testing, etc) tailored for such model. All of these methods are the most efficient inferential procedures if the model matches the reality, in other words, if the data generating process truly satisfies the assumptions. Otherwise the methods may be inconsistent.

Nonparametric inference favors generality. Given a set of minimal and weak assumptions (e.g., certain smoothness of a density or existence of moments of a random variable), it provides inferential methods that are consistent for broad situations, in exchange for losing efficiency for small or moderate sample sizes. Broadly speaking, a statistical technique qualifies as “nonparametric” if it does not rely on parametric assumptions, these typically having a finite-dimensional nature.10

Hence, for any specific data generation process there is a parametric method that dominates its nonparametric counterpart in efficiency. But knowledge of the data generation process is rarely the case in practice. That is the appeal of a nonparametric method: it will perform adequately no matter what the data generation process is. For that reason, nonparametric methods are useful:

  1. When we have no clue on what could be a good parametric model.
  2. For creating goodness-of-fit tests employed to validate parametric models.

The following example aims to illustrate the first advantage, the most useful in practice.

Example 1.7 Assume we have a sample \(X_1,\ldots,X_n\) from a random variable \(X\) and we want to estimate its distribution function \(F.\) Without any assumption, we know that the ecdf in (1.1) is an estimate for \(F(x)=\mathbb{P}[X\leq x].\) It is indeed a nonparametric estimate for \(F.\) Its expectation and variance are

\[\begin{align*} \mathbb{E}[F_n(x)]=F(x),\quad \mathbb{V}\mathrm{ar}[F_n(x)]=\frac{F(x)(1-F(x))}{n}. \end{align*}\]

From the squared bias and variance, we can get the MSE:

\[\begin{align*} \mathrm{MSE}[F_n(x)]=\frac{F(x)(1-F(x))}{n}. \end{align*}\]

Assume now that \(X\sim\mathrm{Exp}(\lambda).\) By maximum likelihood, it is possible to estimate \(\lambda\) as \(\hat \lambda_\mathrm{ML}={\bar X}^{-1}.\) Then, we have the following estimate for \(F(x)\):

\[\begin{align} F(x;\hat\lambda_\mathrm{ML})=1-e^{-\hat\lambda_\mathrm{ML} x}. \tag{1.8} \end{align}\]

Obtaining the exact MSE for (1.8) is not so simple, even if it is easy to prove that \(\hat\lambda_\mathrm{ML}\sim \mathrm{IG}(\lambda^{-1},n).\) Approximations are possible using Exercise 1.2. However, the MSE can be easily approximated by Monte Carlo.

What happens when the data is generated from an \(\mathrm{Exp}(\lambda)\)? Then (1.8) uniformly dominates (1.1) in performance. But, even for small deviations from \(\mathrm{Exp}(\lambda)\) given by \(\Gamma(\lambda, p),\) \(p\neq 1,\) the parametric estimator (1.8) shows major problems in terms of bias, while the performance of the nonparametric estimator (1.1) is completely unaltered. The animation in Figure 1.4 illustrates precisely this behavior.

Figure 1.4: A simplified example of parametric and nonparametric estimation. The objective is to estimate the distribution function \(F\) of a random variable. The data is generated from a \(\Gamma(\lambda,p).\) The parametric method assumes that \(p=1,\) that is, that the data comes from a \(\mathrm{Exp}(\lambda).\) The nonparametric method does not assume anything on the data generation process. The left plot shows the true distribution function and ten estimates of each method from samples of size \(n.\) The right plot shows the MSE of each method on estimating \(F(x).\) Application available here.


  1. For example, the exact two-sided \(t\)-test for the mean of a random variable \(X,\) i.e., the test \(H_0:\mu=\mu_0\) vs. \(H_1:\mu\neq\mu_0,\) assumes that \(X\sim\mathcal{N}(\mu,\sigma^2).\) This is an assumption indexed by the two parameters \((\mu,\sigma^2)\in\mathbb{R}\times\mathbb{R}^+.\)↩︎