## 2.2 Kernel density estimation

The moving histogram (2.2) can be equivalently written as

\[\begin{align} \hat f_\mathrm{N}(x;h)&=\frac{1}{nh}\sum_{i=1}^n\frac{1}{2}1_{\left\{-1<\frac{x-X_i}{h}<1\right\}}\nonumber\\ &=\frac{1}{nh}\sum_{i=1}^nK\left(\frac{x-X_i}{h}\right), \tag{2.3} \end{align}\]

with \(K(z)=\frac{1}{2}1_{\{-1<z<1\}}\). Interestingly, \(K\) is a uniform *density* in \((-1,1)\). This means that, when approximating

\[\begin{align*} \mathbb{P}[x-h<X<x+h]=\mathbb{P}\left[-1<\frac{x-X}{h}<1\right] \end{align*}\]

by (2.3), we give equal weight to all the points \(X_1,\ldots,X_n\). The generalization of (2.3) is now obvious: replace \(K\) by an arbitrary density. Then \(K\) is known as a *kernel*, a density that is typically symmetric and unimodal at \(0\). This generalization provides the definition of **kernel density estimator** (kde)^{5}:

\[\begin{align} \hat f(x;h):=\frac{1}{nh}\sum_{i=1}^nK\left(\frac{x-X_i}{h}\right). \tag{2.4} \end{align}\]

A common notation is \(K_h(z):=\frac{1}{h}K\left(\frac{z}{h}\right)\), so \(\hat f(x;h)=\frac{1}{n}\sum_{i=1}^nK_h(x-X_i)\).

Several types of kernels are possible. The most popular is the *normal kernel* \(K(z)=\phi(z)\), although the *Epanechnikov kernel*, \(K(z)=\frac{3}{4}(1-z^2)1_{\{|z|<1\}}\), is the most efficient. The *rectangular kernel* \(K(z)=\frac{1}{2}1_{\{|z|<1\}}\) yields the moving histogram as a particular case. The kde inherits the smoothness properties of the kernel. That means, for example, (2.4) with a normal kernel is infinitely differentiable. But with an Epanechnikov kernel, (2.4) is not differentiable, and with a rectangular kernel is not even continuous. However, if a certain smoothness is guaranteed (continuity at least), the *choice of the kernel has little importance in practice* (at least compared with the choice of \(h\)). Figure 2.2 illustrates the construction of the kde, and the bandwidth and kernel effects.

It is useful to recall the kde with the normal kernel. If that is the case, then \(K_h(x-X_i)=\phi_h(x-X_i)\) and the kernel is the density of a \(\mathcal{N}(X_i,h^2)\). Thus the bandwidth \(h\) can be thought as the standard deviation of the normal densities that have mean zero.

Implementation of the kde in R is built-in through the `density`

function. The function automatically chooses the bandwidth \(h\) using a *bandwidth selector* which will be studied in detail in Section 2.4.

```
# Sample 100 points from a N(0, 1)
set.seed(1234567)
samp <- rnorm(n = 100, mean = 0, sd = 1)
# Quickly compute a kde and plot the density object
# Automatically chooses bandwidth and uses normal kernel
plot(density(x = samp))
# Select a particular bandwidth (0.5) and kernel (Epanechnikov)
lines(density(x = samp, bw = 0.5, kernel = "epanechnikov"), col = 2)
```

```
# density automatically chooses the interval for plotting the kde
# (observe that the black line goes to roughly between -3 and 3)
# This can be tuned using "from" and "to"
plot(density(x = samp, from = -4, to = 4), xlim = c(-5, 5))
```

```
# The density object is a list
kde <- density(x = samp, from = -5, to = 5, n = 1024)
str(kde)
## List of 7
## $ x : num [1:1024] -5 -4.99 -4.98 -4.97 -4.96 ...
## $ y : num [1:1024] 5.98e-17 3.46e-17 2.56e-17 3.84e-17 4.50e-17 ...
## $ bw : num 0.315
## $ n : int 100
## $ call : language density.default(x = samp, n = 1024, from = -5, to = 5)
## $ data.name: chr "samp"
## $ has.na : logi FALSE
## - attr(*, "class")= chr "density"
# Note that the evaluation grid "x"" is not directly controlled, only through
# "from, "to", and "n" (better use powers of 2)
plot(kde$x, kde$y, type = "l")
curve(dnorm(x), col = 2, add = TRUE) # True density
rug(samp)
```

**Exercise 2.1 **Load the dataset `faithful`

. Then:

- Estimate and plot the density of
`faithful$eruptions`

. - Create a new plot and superimpose different density estimations with bandwidths equal to \(0.1\), \(0.5\), and \(1\).
- Get the density estimate at
*exactly*the point \(x=3.1\) using \(h=0.15\) and the Epanechnikov kernel.