Module 6 Introduction to Kernel Density Estimation

In this tutorial, we will review the kernel density estimation and how to perform KDE in R.

Learning Objectives

  • Understanding what is kernel density estimation.

  • Understand what determines the smoothness of the KDE surface.

  • Be able to plot KDE values.

  • How to use KDE plots as a comparison tool.

Additional Resources

Kernel Density Plots

Intro to KDE

Spatial KDE Cran

Kernel Density Estimation by Matthew Conlen

6.1 Kernel Density Estimation (KDE)

Point Patterns are a collections of geographical points (i.e. breaches) assumed to have been generated by a random (true) underlying process. Points consists of a set of observed \((x,y)\) coordinates. We may want to assess if locations of points are related (i.e. if the points refer to locations of cases of a contagious disease, then it is likely they occur near each other and therefore not independent). How can we measure the likelihood of a point given the observed point patterns? We can use surfaces showing where events are more likely to happen based on probabilities.

Intuition behind KDE is simple- the more observed data points in a sample that occur around a location, the higher the likelihood of an observation occurring at that location.

Let \(f(x)\) denote the intensity function (expected number of cases per unit distance at location \(x\)). We estimate intensity \(f(x)\) via kernel estimation. What does that mean?

  • Place a small “kernel” at each observed data point.
  • Spreads each observation around usually with something that looks like a normal distribution.
  • Sum kernel values to give intensity estimate at each location \(x\).
  • KDE averages a series of small “bumps” (2 dimensional probability distributions) centered on each observed point.
  • Figure: initial points (left), bump centered on each point (middle), average of bumps giving estimate of probability density function (right).

A probability density function (PDF) is

What is probability density function? A function that gives the probability (likelihood) of an event occurring at each location in your space. PDF’s gives a surface across space showing where events are more likely to happen, i.e., the probability of an event occurring at location \(s\).

What is intensity? The expected number of events occurring at location \(s\). Note: Intensity \(=\) number of events across all locations \(\times\) Density.

6.2 Probability Distribution Estimation using KDE

We calculate the KDE by using weighted distances of all observed data points from various locations on a linearly spaces set of points (e.g., all locations in which we want to estimate the likelihood). The concept of weighting the distances of our observations from a particular point in \(x\) can be expressed in simple mathematical terms: \[\hat{f}(x) = \sum_{observations} K\left(\frac{x - observation}{bandwidth}\right)\]

In algebraic terms, the approximation \(f(\mathbf{x})\) for a given location \(\mathbf{x} = (x,y)\) is given by: \[\hat{f}(x) = \hat{f}(x,y) = \frac{1}{n \cdot b_x \cdot b_y} \sum_i K\left(\frac{x-x_i}{b_x}, \frac{y-y_i}{b_y}\right)\]

6.2.1 Kernel Function

The Kernel Function \(K\left(\frac{x-x_i}{b_x}, \frac{y-y_i}{b_y}\right)\) creates the bumps. This means it specifies how to compute the probability density given the distance between location \((x,y)\) and the observation \((x_i, y_i)\). It must be a non-negative function, i.e., probability is greater than or equal to 0. We define the function \(K()\) using a probability distribution, i.e., a normal distribution. The entire equation describes the “bump averaging process”.

Let’s consider a simple example:

Observed data: \(x_1 = 30\)

Bandwidth \((b_x)\): 5

Linear support: \(X = \{20, 21, 22, 23, 24,..., 40\}\)

Kernel Function: Gaussian (normal) \(K(x) = \frac{1}{2\pi} e^{\frac{-1}{2} x^2}\)

Looking at point \(x_1 = 30\) and \(X_i = 21\), we have \(\frac{X - x_i}{b_x} = \frac{21-30}{5} = -1.8\).

Plugging this into the Gaussian (normal) function \(K(x) = \frac{1}{2\pi} e^{\frac{-1}{2} (-1.8^2)} = 0.0315\). This value represents the probability of a location at \(X=30\) given the observed value of \(x_1 = 20\). We can calculate probabilities (the kernel density) at a range of values as shown below. We use the normal kernel density function to calculate each probability.

#Observed point location
x1 <- 30
bandwidth = 5
# Range of values to calculate density
X <- seq(20, 40, by = 1)

#Calculation of KDF across range of values
kernel_func <- 1/(2*pi) * exp(-0.5 * ((X - x1)/bandwidth)^2)

plot(X, kernel_func, type = "l", col = "red", xlab = "X", ylab = "PDF")

What we get is the probability density function (PDF) centered around the observation at \(x_1 = 30\). The x-axis gives the range of \(X\) values (20 to 40), and the y-axis gives the probability of an observed point at that value of \(X\). We see that there are higher probabilities at x-values closer to \(x_1=30\). The \(x\) values farther away have a lower probability (likelihood). Now let’s look at multiple observed locations.

#Observed point location
x1 <- 30
x2 <- 32
x3 <- 35

bandwidth = 5
# Range of values to calculate density
X <- seq(20, 40, by = 1)

#Calculation of KDF across range of values
kernel_func_1 <- 1/(2*pi) * exp(-0.5 * ((X - x1)/bandwidth)^2)
kernel_func_2 <- 1/(2*pi) * exp(-0.5 * ((X - x2)/bandwidth)^2)
kernel_func_3 <- 1/(2*pi) * exp(-0.5 * ((X - x3)/bandwidth)^2)

plot(X, kernel_func_1, type = "l", col = "red", ylab = "KDF")
lines(X, kernel_func_2, type = "l", col = "blue")
lines(X, kernel_func_3, type = "l", col = "green")
points(25, kernel_func_1[6])
points(25, kernel_func_2[6])
points(25, kernel_func_3[6])
legend("topleft", col = c("red", "blue", "green"), lty = c(1, 1, 1), c("x1=30", "x2=32", "x3=35"))

When we move across all observations we get an individual kernel function (pdf) for each observation across the range of \(X\) values. Coming back to how to compute the KDE \[\hat{f}(x) = \hat{f}(x,y) = \frac{1}{n \cdot b_x \cdot b_y} \sum_i k\left(\frac{x-x_i}{b_x}, \frac{y-y_i}{b_y}\right)\]

We sum the kernel functions at each point. For example, to calculate the KDE for \(X=25\) (the likelihood or probability that \(X=25\)), we sum the probabilities across the multiple kernel functions.

#Observed point location

# Set X to 25
X <- 25

#Calculation of KDF across range of values
kernel_func_1 <- 1/(2*pi) * exp(-0.5 * ((X - x1)/bandwidth)^2)
kernel_func_2 <- 1/(2*pi) * exp(-0.5 * ((X - x2)/bandwidth)^2)
kernel_func_3 <- 1/(2*pi) * exp(-0.5 * ((X - x3)/bandwidth)^2)

KDE <- kernel_func_1 + kernel_func_2 + kernel_func_3
[1] 0.1778042

6.2.2 Bandwidth

What is bandwidth The selected bandwidth controls how smooth the estimated density curve is. There are some convenient things to remember:

  • \(b\) controls the smoothness of the density estimate (Larger \(b\) gives smoother surface)

  • As we increase \(b\), points further from our location are included resulting in a smoother distribution. This is related to the variance of the kernel (how wide it is?)

  • \(b_y, b_x\) refer to bandwidths that control how much you want to smooth the probability density surface. Lower values of \(b's\) give spiky distributions and high values flatten the distribution.

Figure: too small \(b's\) (left). appropriate \(b's\) (middle), too high (right)

So how do we choose bandwidth given the data \(\mathbf{x}_i\)? The general rule of thumb: \[b_x = s_x \left(\frac{2}{3n}\right) ^{1/6}\] \[b_y = s_y \left(\frac{2}{3n}\right) ^{1/6}\]

where \(s_x, s_y\) is the standard deviation of the \(x_i\) and \(y_i\) observed values.

6.3 Let’s look again at the Breach data…

Steps to estimate the KDE:

  1. Install the spatialKDE package from CRAN.

  2. Call the library SpatialKDE.

  3. Calculate the bandwidth of the points data. Using the breach data we calculate the standard deviation of x coordinates \(s_x =6318\) and the number of observations \(n= 180\), and plug into the formula.

\[b_x = s_x\left(\frac{2}{3n}\right) ^{1/6} = 6318 \left(\frac{2}{3 \cdot 180}\right) ^{1/6} \approx 2485 \]

  1. Create a grid of equally spaced points using the \(create\_grid\_rectangular()\) function where distance is set by \(cell\_size\). Here we use a cell size of 1000.

  2. Calculate the KDE using the \(kde()\) function. Inputs include the points (breaches) \(sf\) data frame, bandwidth, and grid to calculate the densities.


#This gives a data frame of x,y coordinates from breach locations
coordinates <- data.frame(st_coordinates(breach_sf))

#Calculate the number of observations (N)
N <- dim(coordinates)[1]

#Calculate the sd of x's
s_x <- sd(coordinates$X)

#The bandwidth can be the same for x and y for simplicity.
bandwidth <- s_x * (2/(3 *N))^(1/6)
[1] 2485.437
#Create grid of equally spaced rectangles. This is a grid where we are going to estimate probabilities

#Inputs = sf point data frame and cell size (distance for equally spaced centers)
grid <- create_grid_rectangular(blocks_sf, cell_size = 1000)

kde_output <- SpatialKDE::kde(points = breach_sf, band_width = bandwidth, grid = grid)
Simple feature collection with 6 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 553731.9 ymin: 147854 xmax: 558731.9 ymax: 149854
                        geometry kde_value
1 POLYGON ((554731.9 147854, ...         0
2 POLYGON ((555731.9 147854, ...         0
3 POLYGON ((556731.9 147854, ...         0
4 POLYGON ((557731.9 147854, ...         0
5 POLYGON ((553731.9 148854, ...         0
6 POLYGON ((554731.9 148854, ...         0

Plot the KDE outcomes

#Plot the KDE
  tm_shape(kde_output) +
  tm_polygons(col = "kde_value") +
  tm_shape(breach_sf) +
  tm_dots(col = "blue", size = 0.1, alpha = 0.3) 
Note the count caption indicates probability densities have been re-scaled to represent intensities- by multiplying the KDE by number of cases. Importantly, clusters of observed locations are associated with higher KDE values. This means there are higher probabilities of breaches of peace in the hot spot areas.

Let’s plot the same plot with different bandwidths. One smaller bandwidth (1000) and one larger bandwidth (4000), and compare the plots. We can see the larger bandwidth gives a smoother surface where neighboring cells have shared attributes. Conversely, the smaller bandwidth results in increased randomness of hotspots, so less smoothed surface.

kde_smallbandwidth <- SpatialKDE::kde(points = breach_sf, band_width = 1000, grid = grid)
kde_largebandwidth <- SpatialKDE::kde(points = breach_sf, band_width = 4000, grid = grid)

  tm_shape(kde_smallbandwidth) +
  tm_polygons(col = "kde_value") +
  tm_shape(breach_sf) +
  tm_dots(col = "blue", size = 0.1, alpha = 0.3) +
  tm_layout(title= 'Bandwidth 1000')
  tm_shape(kde_largebandwidth) +
  tm_polygons(col = "kde_value") +
  tm_shape(breach_sf) +
  tm_dots(col = "blue", size = 0.1, alpha = 0.3) +
  tm_layout(title= 'Bandwidth 4000')

6.4 Using KDE for comparisons

We can use KDE to compare spatial distributions across groups. In newhaven data we have data relating to (1) break-ins using forced entry, and (2) break-ins that do not. We may be interested in comparing the spatial distribution of the two groups across the area. Important: To compare spatial distributions, we need to specify a set of levels for intensities to be equal.

Here we calculate the KDEs for both groups: (1) the non-forced break-ins (burgres.n_sf), and (2) the forced break-ins (burgres.f_sf). Importantly, we use the same bandwidth and grid for both calculations AND use the same break points in the plotting to make accurate comparisons.

#calculate KDEs for both groups
kde_breakins_nonf <- kde(points = burgres.n_sf, band_width = 3000, grid = grid)
kde_breakins_f <- kde(points = burgres.f_sf, band_width = 3000, grid = grid)

#Create the maps and store them in variables
plot1 <- tm_shape(kde_breakins_nonf) +
  tm_polygons(col = "kde_value", breaks = c(0,2,4,6,8,10, 12, 14, 16, 18), palette = "Reds",
               title = "non-forced break-ins") +
  tm_dots(col = "green", alpha = 0.5, size = 0.1) 

plot2 <- tm_shape(kde_breakins_f) +
  tm_polygons(col = "kde_value", breaks = c(0,2,4,6,8,10, 12, 14, 16, 18),
              palette = "Reds", title = "forced break-ins") +
  tm_dots(col = "green", alpha = 0.5, size = 0.1) 

 tmap_arrange(plot1, plot2)
Comparing the two KDE plots, we see there are higher densities of forced break-ins in New Haven. Additionally, the areas with higher densities are similar for both forced and non-forced break-ins. The clusters of higher density are located in central New Haven with higher population sizes. Comparing maps 1 and 2 gives us a measure of the difference between two point processes that capture different kinds of events or incidents.

6.5 Using KDE in our applications

Using KDE, we can make inference on the likelihood/probabilities of clusters of locations/points. For example, if our point data consists of cases of disease. KDE gives us the probability of seeing a “new” case for a given location based on the observed set of cases. This means, KDE gives us an analysis method to measure hotspots/clusters of cases across a surface.

In contrast, if our point data is locations of clinics, then KDE gives us the probability of finding a clinic at a given location, based on the observed set of clinics. In that aspect, it gives us a method to measure accessibility or coldspots/access deserts across a surface.

Lab 6 Activity

  1. Read in your project data in \(sf\) format. Call the libraries needed (namely that \(sf\), \(tmap\) and \(spatialKDE\)).

  2. Write code to calculate the bandwidth of your point data. You can copy and paste the code above and revise it to fit your data. Report the “preferred” bandwidth for your data.

  3. Create a grid of equally spaced points to estimate the KDE. Note, you can use the polygon dataset to create a rectangular grid of equally distant points as shown above.

  4. Calculate the KDE using the \(kde()\) function. The inputs to the function are the point data, the bandwidth, and the grid.

  5. Report your min and max KDE values from the output and explain what these values represent?

  6. Plot the KDE output in combination with your point data using \(tmap()\).

  7. Write a paragraph about the map you created and describe:

  1. Where are the higher KDE values in relation to the observed points?

  2. Where are the lower KDE values?

  3. Based on (a) and (b) what conclusions/big picture descriptions can you make about the underlying point process from these KDE values? In other words, write a paragraph about the take-aways based on your plot.

  1. Change the bandwidth of your KDE function to be (1) much smaller, and (2) much higher than your original bandwidth. Then calculate KDEs for both the small/larger bandwidths and plot them together.

  2. Explain in 2-3 sentences how the change in bandwidth has changed your KDE estimates AND how it may change your interpretation of the data based on which bandwidth you use.