Chapter 3 Regression estimation

The relation of two random variables \(X\) and \(Y\) can be completely characterized by their joint cdf \(F,\) or equivalently, by the joint pdf \(f\) if \((X,Y)\) is continuous, the case we will address. In the regression setting, we are interested in predicting/explaining the response \(Y\) by means of the predictor \(X\) from a sample \((X_1,Y_1),\ldots,(X_n,Y_n).\) The role of the variables is not symmetric: \(X\) is used to predict/explain \(Y.\)

The complete knowledge of \(Y\) when \(X=x\) is given by the conditional pdf: \(f_{Y\vert X=x}(y)=\frac{f(x,y)}{f_X(x)}.\) While this pdf provides full knowledge about \(Y\vert X=x,\) it is also a challenging task to estimate it: for each \(x\) we have to estimate a curve! A simpler approach, yet still challenging, is to estimate the conditional mean (a scalar) for each \(x.\) This is the so-called regression function8

\[\begin{align*} m(x):=\mathbb{E}[Y\vert X=x]=\int y\,\mathrm{d}F_{Y\vert X=x}(y)=\int yf_{Y\vert X=x}(y)\,\mathrm{d}y. \end{align*}\]

Thus we aim to provide information about \(Y\)’s expectation, not distribution, by \(X.\)

Finally, recall that \(Y\) can expressed in terms of \(m\) by means of the location-scale model:

\[\begin{align*} Y=m(X)+\sigma(X)\varepsilon, \end{align*}\]

where \(\sigma^2(x):=\mathbb{V}\mathrm{ar}[Y\vert X=x]\) and \(\varepsilon\) is independent from \(X\) and such that \(\mathbb{E}[\varepsilon]=0\) and \(\mathbb{V}\mathrm{ar}[\varepsilon]=1.\)


  1. Recall that we assume that \((X,Y)\) is continuous.↩︎