Chapter 2 Density estimation

A random variable \(X\) is completely characterized by its cdf. Hence, an estimation of the cdf yields as a side-product estimates for different characteristics of \(X\) by plugging-in \(F_n\) in the \(F.\) For example, the mean \(\mu=\mathbb{E}[X]=\int x \,\mathrm{d}F(x)\) can be estimated by \(\int x \,\mathrm{d}F_n(x)=\frac{1}{n}\sum_{i=1}^n X_i=\bar X.\) Despite their usefulness, cdfs are hard to visualize and interpret.

Densities, on the other hand, are easy to visualize and interpret, making them ideal tools for data exploration. They provide immediate graphical information about the most likely areas, modes, and spread of \(X.\) A continuous random variable is also completely characterized by its pdf \(f=F'.\) Density estimation does not follow trivially from the ecdf \(F_n,\) since this is not differentiable (not even continuous), hence the need of the specific procedures we will see in this chapter.