2.2 Kernel density estimation
The moving histogram (2.3) can be equivalently written as
with Interestingly, is a uniform density in ! This means that, when approximating
by (2.6), we are giving equal weight to all the points The generalization of (2.6) to non-uniform weighting is now obvious: replace with an arbitrary density! Then is known as a kernel. As it is commonly23 assumed, we consider to be a density that is symmetric and unimodal at zero. This generalization provides the definition of the kernel density estimator (kde):24
A common notation is the so-called scaled kernel, so that the kde is written as
Figure 2.6: Construction of the kernel density estimator. The animation shows how bandwidth and kernel affect the density estimate, and how the kernels are rescaled densities with modes at the data points. Application available here.
It is useful to recall (2.7) with the normal kernel. If that is the case, then and the kernel is the density of a Thus the bandwidth can be thought of as the standard deviation of a normal density with mean and the kde (2.7) as a data-driven mixture of those densities. Figure 2.6 illustrates the construction of the kde and the bandwidth and kernel effects.

Figure 2.7: Several scaled kernels from R’s density
function. See (2.9) in the remark below for the different parametrization.
Several types of kernels are possible. Figure 2.7 illustrates the form of several implemented in R’s density
function. The most popular is the normal kernel although the Epanechnikov kernel, is the most efficient.25 The rectangular kernel yields the moving histogram as a particular case. Importantly, the kde inherits the smoothness properties of the kernel. That means, for example, that (2.7) with a normal kernel is infinitely differentiable. But with an Epanechnikov kernel, (2.7) is not differentiable, and with a rectangular kernel is not even continuous. However, if a certain smoothness is guaranteed (continuity at least), the choice of the kernel has little importance in practice, at least in comparison with the much more critical choice of the bandwidth
The computation of the kde in R is done through the density
function. The function automatically chooses the bandwidth using a data-driven criterion26 referred to as a bandwidth selector. Bandwidth selectors will be studied in detail in Section 2.4.
# Sample 100 points from a N(0, 1)
set.seed(1234567)
samp <- rnorm(n = 100, mean = 0, sd = 1)
# Quickly compute a kde and plot the density object
# Automatically chooses bandwidth and uses normal kernel
plot(density(x = samp))
# Select a particular bandwidth (0.5) and kernel (Epanechnikov)
lines(density(x = samp, bw = 0.5, kernel = "epanechnikov"), col = 2)
# density() automatically chooses the interval for plotting the kde
# (observe that the black line goes to roughly between -3 and 3)
# This can be tuned using "from" and "to"
plot(density(x = samp, from = -4, to = 4), xlim = c(-5, 5))
# The density object is a list
kde <- density(x = samp, from = -5, to = 5, n = 1024)
str(kde)
## List of 7
## $ x : num [1:1024] -5 -4.99 -4.98 -4.97 -4.96 ...
## $ y : num [1:1024] 4.91e-17 2.97e-17 1.86e-17 2.53e-17 3.43e-17 ...
## $ bw : num 0.315
## $ n : int 100
## $ call : language density.default(x = samp, n = 1024, from = -5, to = 5)
## $ data.name: chr "samp"
## $ has.na : logi FALSE
## - attr(*, "class")= chr "density"
# Note that the evaluation grid "x" is not directly controlled, only through
# "from, "to", and "n" (better use powers of 2)
plot(kde$x, kde$y, type = "l")
curve(dnorm(x), col = 2, add = TRUE) # True density
rug(samp)
Exercise 2.6 Employing the normal kernel:
- Estimate and plot the density of
faithful$eruptions
. - Create a new plot and superimpose different density estimations with bandwidths equal to and
- Get the density estimate at exactly the point using
We next introduce an important remark about the use of the density
function. Before that, we need some notation. From now on, we consider the integrals to be over if not stated otherwise. In addition, we denote
to the second-order moment27 of
Remark. The kernel, say employed in density
uses a parametrization that guarantees that This implies that the variance of the scaled kernel is that is, that These normalized kernels may be different from the ones we have considered in the exposition. For example, the uniform kernel for which is implemented in R as (observe in Figure 2.7 that the rectangular kernel takes the value ).
The density
’s normalized kernel can be obtained from and vice versa, with a straightforward derivation. On the one hand, since
by making the change of variables in the second equality. Now, based on (2.8), define
and this is a kernel that satisfies and is a density unimodal about It is indeed the kernel employed by density
.
Therefore, if we consider a bandwidth for the normalized kernel (resulting ), we have that
where
As a consequence, a normalized kernel with a given bandwidth (for example, the one considered in density
’s bw
) has an associated scaled bandwidth for the unnormalized kernel
The following code numerically illustrates the difference between and for the Epanechnikov kernel.
# Implementation of the Epanechnikov based on the theory
K_Epa <- function(z, h = 1) 3 / (4 * h) * (1 - (z / h)^2) * (abs(z) < h)
mu2_K_Epa <- integrate(function(z) z^2 * K_Epa(z), lower = -1, upper = 1)$value
# Epanechnikov kernel by R
h <- 0.5
plot(density(0, kernel = "epanechnikov", bw = h))
# Build the equivalent bandwidth
h_tilde <- h / sqrt(mu2_K_Epa)
curve(K_Epa(x, h = h_tilde), add = TRUE, col = 2)
# The other way around
h_tilde <- 2
h <- h_tilde * sqrt(mu2_K_Epa)
curve(K_Epa(x, h = h_tilde), from = -3, to = 3, col = 2)
lines(density(0, kernel = "epanechnikov", bw = h))
Exercise 2.7 Obtain the kernel for:
- the normal kernel
- the Epanechnikov kernel
- the kernel
Exercise 2.8 Given obtain such that for:
- the uniform kernel
- the kernel
- the kernel
Exercise 2.9 Repeat Exercise 2.6, but now considering the Epanechnikov kernel in its unnormalized form. To do so, find the adequate bandwidths to input in density
’s bw
.
References
In greater generality, the kernel might only be assumed to be an integrable function with unit integral.↩︎
Also known as the Parzen–Rosemblatt estimator to honor the proposals by Parzen (1962) and Rosenblatt (1956).↩︎
Although the efficiency of the normal kernel, with respect to the Epanechnikov kernel, is roughly ↩︎
Precisely, the rule-of-thumb given by
bw.nrd
.↩︎The variance, since because the kernel is symmetric with respect to zero.↩︎