Chapter 4 Kernel regression estimation I

The relation between two random variables $X$ and $Y$ can be completely characterized by their joint cdf $F$ or, equivalently, by their joint pdf $f$ if $(X,Y)$ is continuous. In the regression setting, we are interested in predicting/explaining the response $Y$ by means of the predictor $X$ from a sample $(X_1,Y_1),\ldots,(X_n,Y_n).$ The role of the variables is not symmetric: $X$ is used to predict/explain $Y.$ See Section B.1 for a quick review on the relevant concepts of linear regression used in this chapter.

We first consider the simplest situation:¹¹⁹ a single continuous predictor $X$ to predict a response $Y.$ ¹²⁰ In this case, recall that the complete knowledge about $Y$ when $X=x$ is given by the conditional pdf $f_{Y| X=x}(y)=\frac{f(x,y)}{f_X(x)}.$ While this pdf provides full knowledge about $Y| X=x,$ estimating it is also challenging: for each $x$ we have to estimate a different curve! A simpler approach, yet still challenging, is to estimate the conditional mean (a scalar) for each $x$ through the so-called regression function

$\begin{align} m(x):=\mathbb{E}[Y\vert X=x]=\int y\,\mathrm{d}F_{Y\vert X=x}(y)=\int yf_{Y\vert X=x}(y)\,\mathrm{d}y.\tag{4.1} \end{align}$

As we will see, this density-based view of the regression function is very useful to motivate estimators.

For the sake of introducing the main concepts in kernel regression estimation. On Chapter 5 we will see more general situations with several predictors, possibly non-continuous.↩︎
Formally, the response $Y$ does not need to be continuous. We implicitly assume $Y$ is continuous to use (4.2) as a motivation for (4.5), but the subsequent derivations in the chapter are also valid for non-continuous responses.↩︎