2.2 Kernel density estimation

The moving histogram (2.2) can be equivalently written as

\[\begin{align} \hat f_\mathrm{N}(x;h)&=\frac{1}{nh}\sum_{i=1}^n\frac{1}{2}1_{\left\{-1<\frac{x-X_i}{h}<1\right\}}\nonumber\\ &=\frac{1}{nh}\sum_{i=1}^nK\left(\frac{x-X_i}{h}\right), \tag{2.3} \end{align}\]

with \(K(z)=\frac{1}{2}1_{\{-1<z<1\}}.\) Interestingly, \(K\) is a uniform density in \((-1,1).\) This means that, when approximating

\[\begin{align*} \mathbb{P}[x-h<X<x+h]=\mathbb{P}\left[-1<\frac{x-X}{h}<1\right] \end{align*}\]

by (2.3), we give equal weight to all the points \(X_1,\ldots,X_n.\) The generalization of (2.3) is now obvious: replace \(K\) by an arbitrary density. Then \(K\) is known as a kernel, a density that is typically symmetric and unimodal at \(0.\) This generalization provides the definition of kernel density estimator (kde)5:

\[\begin{align} \hat f(x;h):=\frac{1}{nh}\sum_{i=1}^nK\left(\frac{x-X_i}{h}\right). \tag{2.4} \end{align}\]

A common notation is \(K_h(z):=\frac{1}{h}K\left(\frac{z}{h}\right),\) so \(\hat f(x;h)=\frac{1}{n}\sum_{i=1}^nK_h(x-X_i).\)

Several types of kernels are possible. The most popular is the normal kernel \(K(z)=\phi(z),\) although the Epanechnikov kernel, \(K(z)=\frac{3}{4}(1-z^2)1_{\{|z|<1\}},\) is the most efficient. The rectangular kernel \(K(z)=\frac{1}{2}1_{\{|z|<1\}}\) yields the moving histogram as a particular case. The kde inherits the smoothness properties of the kernel. That means, for example, (2.4) with a normal kernel is infinitely differentiable. But with an Epanechnikov kernel, (2.4) is not differentiable, and with a rectangular kernel is not even continuous. However, if a certain smoothness is guaranteed (continuity at least), the choice of the kernel has little importance in practice (at least compared with the choice of \(h\)). Figure 2.2 illustrates the construction of the kde, and the bandwidth and kernel effects.

It is useful to recall the kde with the normal kernel. If that is the case, then \(K_h(x-X_i)=\phi_h(x-X_i)\) and the kernel is the density of a \(\mathcal{N}(X_i,h^2).\) Thus the bandwidth \(h\) can be thought as the standard deviation of the normal densities that have mean zero.

Figure 2.2: Construction of the kernel density estimator. The animation shows how the bandwidth and kernel affect the density estimate, and how the kernels are rescaled densities with modes at the data points. Application also available here.

Implementation of the kde in R is built-in through the density function. The function automatically chooses the bandwidth \(h\) using a bandwidth selector which will be studied in detail in Section 2.4.

# Sample 100 points from a N(0, 1)
set.seed(1234567)
samp <- rnorm(n = 100, mean = 0, sd = 1)

# Quickly compute a kde and plot the density object
# Automatically chooses bandwidth and uses normal kernel
plot(density(x = samp))

# Select a particular bandwidth (0.5) and kernel (Epanechnikov)
lines(density(x = samp, bw = 0.5, kernel = "epanechnikov"), col = 2)


# density automatically chooses the interval for plotting the kde
# (observe that the black line goes to roughly between -3 and 3)
# This can be tuned using "from" and "to"
plot(density(x = samp, from = -4, to = 4), xlim = c(-5, 5))


# The density object is a list
kde <- density(x = samp, from = -5, to = 5, n = 1024)
str(kde)
## List of 7
##  $ x        : num [1:1024] -5 -4.99 -4.98 -4.97 -4.96 ...
##  $ y        : num [1:1024] 5.98e-17 3.46e-17 2.33e-17 3.40e-17 4.29e-17 ...
##  $ bw       : num 0.315
##  $ n        : int 100
##  $ call     : language density.default(x = samp, n = 1024, from = -5, to = 5)
##  $ data.name: chr "samp"
##  $ has.na   : logi FALSE
##  - attr(*, "class")= chr "density"
# Note that the evaluation grid "x"" is not directly controlled, only through
# "from, "to", and "n" (better use powers of 2)
plot(kde$x, kde$y, type = "l")
curve(dnorm(x), col = 2, add = TRUE) # True density
rug(samp)

Exercise 2.1 Load the dataset faithful. Then:

  • Estimate and plot the density of faithful$eruptions.
  • Create a new plot and superimpose different density estimations with bandwidths equal to \(0.1,\) \(0.5,\) and \(1.\)
  • Get the density estimate at exactly the point \(x=3.1\) using \(h=0.15\) and the Epanechnikov kernel.

References

Parzen, E. 1962. “On Estimation of a Probability Density Function and Mode.” Ann. Math. Statist. 33 (3): 1065–76.
Rosenblatt, M. 1956. “Remarks on Some Nonparametric Estimates of a Density Function.” Ann. Math. Statist. 27 (3): 832–37.

  1. Also known as the Parzen–Rosemblatt estimator to honor the proposals by Parzen (1962) and Rosenblatt (1956).↩︎