# Chapter 4 Kernel regression estimation I

The relation between two random variables \(X\) and \(Y\) can be completely characterized by their joint cdf \(F\) or, equivalently, by their joint pdf \(f\) if \((X,Y)\) is continuous. In the regression setting, we are interested in predicting/explaining the *response* \(Y\) by means of the *predictor* \(X\) from a sample \((X_1,Y_1),\ldots,(X_n,Y_n)\). The role of the variables is not symmetric: \(X\) is *used* to predict/explain \(Y\).

We first consider the simplest situation^{115}: a single *continuous* predictor \(X\) to predict a response \(Y\).^{116} In this case, recall that the complete knowledge about \(Y\) when \(X=x\) is given by the conditional pdf \(f_{Y| X=x}(y)=\frac{f(x,y)}{f_X(x)}\). While this pdf provides full knowledge about \(Y| X=x\), estimating it is also challenging: for each \(x\) we have to estimate a *different curve*! A simpler approach, yet still challenging, is to estimate the conditional mean (a scalar) for each \(x\) through the so-called *regression function*

\[\begin{align} m(x):=\mathbb{E}[Y\vert X=x]=\int y\mathrm{d}F_{Y\vert X=x}(y)=\int yf_{Y\vert X=x}(y)\mathrm{d}y.\tag{4.1} \end{align}\]

As we will see, this density-based view of the regression function is very useful to motivate estimators.

For the sake of introducing the main concepts in kernel regression estimation. On Chapter 5 we will see more general situations with several predictors, possibly non-continuous.↩︎

Formally, the response \(Y\) does not need to be continuous. We implicitly assume \(Y\) is continuous to use (4.2) as a motivation for (4.5), but the subsequent derivations in the chapter are also valid for non-continuous responses.↩︎