# Chapter 3 Regression estimation

The relation of two random variables \(X\) and \(Y\) can be completely characterized by their joint cdf \(F\), or equivalently, by the joint pdf \(f\) if \((X,Y)\) is continuous, the case we will address. In the regression setting, we are interested in predicting/explaining the *response* \(Y\) by means of the *predictor* \(X\) from a sample \((X_1,Y_1),\ldots,(X_n,Y_n)\). The role of the variables is not symmetric: \(X\) is *used* to predict/explain \(Y\).

The complete knowledge of \(Y\) when \(X=x\) is given by the conditional pdf: \(f_{Y\vert X=x}(y)=\frac{f(x,y)}{f_X(x)}\). While this pdf provides full knowledge about \(Y\vert X=x\), it is also a challenging task to estimate it: for each \(x\) we have to estimate a *curve*! A simpler approach, yet still challenging, is to estimate the conditional mean (a scalar) for each \(x\). This is the so-called *regression function*^{8}

\[\begin{align*} m(x):=\mathbb{E}[Y\vert X=x]=\int y\,\mathrm{d}F_{Y\vert X=x}(y)=\int yf_{Y\vert X=x}(y)\,\mathrm{d}y. \end{align*}\]

Thus we aim to provide information about \(Y\)’s expectation, not distribution, by \(X\).

Finally, recall that \(Y\) can expressed in terms of \(m\) by means of the *location-scale model*:

\[\begin{align*} Y=m(X)+\sigma(X)\varepsilon, \end{align*}\]

where \(\sigma^2(x):=\mathbb{V}\mathrm{ar}[Y\vert X=x]\) and \(\varepsilon\) is independent from \(X\) and such that \(\mathbb{E}[\varepsilon]=0\) and \(\mathbb{V}\mathrm{ar}[\varepsilon]=1\).

Recall that we assume that \((X,Y)\) is continuous.↩︎