Chapter 6 Nonparametric regression

The models we saw in the previous chapters share a common root: all of them are parametric. This means that they assume a certain structure on the regression function \(m,\) which is controlled by parameters185. If this assumption truly holds, then parametric methods are the best approach for estimating \(m.\) But in practice it is rarely the case where parametric methods work out-of-the-box, and several tricks are needed in order to expand their degree of flexibility in a case-by-case basis. Avoiding this nuisance is the strongest point of nonparametric methods: they do not assume major hard-to-satisfy hypotheses on the regression function, but just minimal assumptions, which makes them directly employable. Their weak points are that they usually are more computationally demanding and are harder to interpret.

We consider first the simplest situation186: a single continuous predictor \(X\) for predicting a response \(Y.\) In this case, recall that the complete knowledge of \(Y\) when \(X=x\) is given by the conditional pdf \(f_{Y| X=x}(y)=\frac{f(x,y)}{f_X(x)}.\) While this pdf provides full knowledge about \(Y| X=x,\) it is also a challenging task to estimate it: for each \(x\) we have to estimate a different curve! A simpler approach, yet still challenging, is to estimate the conditional mean (a scalar) for each \(x\) through the regression function

\[\begin{align*} m(x)=\mathbb{E}[Y| X=x]=\int yf_{Y| X=x}(y)\,\mathrm{d}y. \end{align*}\]

As we will see, this density-based view of the regression function is useful in order to motivate estimators.


  1. For example, linear models assume that \(m\) is of the form \(m(\mathbf{x})=\beta_0+\beta_1x_1+\cdots+\beta_px_p\) for some unknown coefficients \(\boldsymbol{\beta}.\)↩︎

  2. For the sake of introducing the main concepts, on Section 6.3 we will see the full general situation.↩︎