Chapter 3 Regression estimation

The relation of two random variables $X$ and $Y$ can be completely characterized by their joint cdf $F,$ or equivalently, by the joint pdf $f$ if $(X,Y)$ is continuous, the case we will address. In the regression setting, we are interested in predicting/explaining the response $Y$ by means of the predictor $X$ from a sample $(X_1,Y_1),\ldots,(X_n,Y_n).$ The role of the variables is not symmetric: $X$ is used to predict/explain $Y.$

The complete knowledge of $Y$ when $X=x$ is given by the conditional pdf: $f_{Y\vert X=x}(y)=\frac{f(x,y)}{f_X(x)}.$ While this pdf provides full knowledge about $Y\vert X=x,$ it is also a challenging task to estimate it: for each $x$ we have to estimate a curve! A simpler approach, yet still challenging, is to estimate the conditional mean (a scalar) for each $x.$ This is the so-called regression function⁸

$\begin{align*} m(x):=\mathbb{E}[Y\vert X=x]=\int y\,\mathrm{d}F_{Y\vert X=x}(y)=\int yf_{Y\vert X=x}(y)\,\mathrm{d}y. \end{align*}$

Thus we aim to provide information about $Y$ ’s expectation, not distribution, by $X.$

Finally, recall that $Y$ can expressed in terms of $m$ by means of the location-scale model:

$\begin{align*} Y=m(X)+\sigma(X)\varepsilon, \end{align*}$

where $\sigma^2(x):=\mathbb{V}\mathrm{ar}[Y\vert X=x]$ and $\varepsilon$ is independent from $X$ and such that $\mathbb{E}[\varepsilon]=0$ and $\mathbb{V}\mathrm{ar}[\varepsilon]=1.$

Recall that we assume that $(X,Y)$ is continuous.↩︎