UNCLASSIFIED

This map shows the GPS location density for a population of Grey Wolves in northeastern Alberta’s Athabasca Oil Sands Region from early 2012 to late 2014 and highlights the location data for one wolf of interest in blue. Clearly, the wolf of interest has unique movement patterns, but how do we gain insights about this wolf’s pattern of life? This tutorial demonstrates a method to extract reliable loiter information from large spatial-temporal GPS data sets.

Introduction

Key challenge: correctly infer that an observation as moving or loitering
- Main common assumption: observations close together signals potential loitering
Common clustering techniques:
- Time-based (Hariharan and Toyama 2004) and (Liu, Wolfson, and Yin 2006)
- Centroid-based k-means (Ashbrook and Starner 2002)
- Density-based:
  - DBSCAN (Ester, Kriegel, and Xu, n.d.)
  - ST-DBSCAN (Birant and Kut 2007)
  - T-DBSCAN (Chen, Ji, and Wang 2014)
  - HDBSCAN (Campello et al. 2015)
Proposed Method: Hierarchical Debsity Based Spatial Clusting of Applications with Noise (HDBSCAN)
- Data informed parameters
- Spatial then temporal clusters

Case Study Data

Overview
- Locations and time of collection of 46 wolves between March 17, 2012 to September 13, 2014
- Derived from a Movebank study on grey wolves in northeastern Alberta’s Athabasca Oil Sands Region, available here
Original primary study areas:
- Habitat use and selection (Boutin et al. 2015)
- Predator-prey dynamics (Neilson and Boutin 2017)
- Effects of human activity (Boutin et al. 2015); (Neilson and Boutin 2017)
- Responses to snow conditions (Droghini and Boutin 2018)
Exploratory Data Analysis
- Gain understanding for data limitations
- Investigate collection bias
- Define relative parameters with the data
  - Define “close” in terms of space and time comparatively

Extraction

The computation below leverages data.table’s fread function to read the csv and answer some preliminary exploratory questions concerning the data.

# Leverage fread function from package data.table to quickly read in csv data as an R object.
initData <- fread("data/ABoVE_ Boutin Alberta Grey Wolf.csv")

# How many wolves?
initData[# all rows
         ,
         # data.table fast unique count for the column 
         # labeling the different wolves
         uniqueN(`tag-local-identifier`)]

## [1] 43

# How big is the data set
utils::object.size(initData)

## 32539288 bytes

# What are the dimensions
dim(initData)

## [1] 239194     18

# View the first three rows 
head(initData,3) %>% formattable(align="l") %>%
  formattable::as.datatable(rownames=FALSE,
                            options=list(
                              searching = FALSE,
                              scrollX = TRUE,
                              columnDefs = list(list(className = 'dt-left', targets = '_all'))))

Transformation

The raw data set has 239,194 observations on 46 wolves. We only need the following columns for our study:

study-local-timestamp
tag-local-identifier
location-long
location-lat

The following computation extracts the data relevant to this study, and customizes the column labels for convenience.

# Load only the relevant columns
data <- fread("data/ABoVE_ Boutin Alberta Grey Wolf.csv",
             select=c("study-local-timestamp",
                     "tag-local-identifier",
                     "location-long",
                     "location-lat"),
            
             # Make sure data.table does not automatically generate factor columns
             stringsAsFactors = FALSE) %>% 
  
             # Omit NAs in the data. Familiarization with how the data was collected is 
             # necessary to consider retaining these values and making them useful
             na.omit()

# Set the column names for convenience
setnames(data, 
         
         # Vector of old names that we want to change
         old=c("study-local-timestamp",
                     "tag-local-identifier",
                     "location-long",
                     "location-lat"),
         
         # Vector of new more convenient names
         new=c("dtg", # date time group
               "cid", # component ID
               "lon", # longitude
               "lat")) # latitude

Load

This study focuses on making inferences about one individual, so we use one wolf as a surrogate. The computation below identifies a wolf of interest and subsets the data to only include this wolf’s observations.

# Use data.table indexing to determine to wolf with the most data
wolf.of.Interest <- data[,.(count=.N),by="cid"][count==max(count)]$cid 

# Subset to focus on one wolf:
dt_init <- data[cid==wolf.of.Interest]
m <- nrow(dt_init )

# Create datetime objects from dtg character strings
dt_init [,"dtg" := dtg %>% as.POSIXct(format="%Y-%m-%d %H:%M:%S")]

# Order the data sequentially by date time group
setorder(dt_init,dtg)

# Set inter-obs time interval column
dt_init[,"timeInt" := dtg %>% difftime(shift(dtg),unit="secs")]
dt_init[,"timeInt" := ifelse(timeInt <= 24*3600,timeInt,NA)]

# Use lubridate package to get the weekday from the date objects
dt_init[,"Weekday" := dtg %>% lubridate::wday(label=TRUE)]

# Get the hour from the date objects
dt_init[,"Hour" := lubridate::hour(dtg)]
dt_init$Hour <- dt_init$Hour %>% Vectorize(military_time)() %>% factor(levels=Vectorize(military_time)(0:23))

# Get the time of day for each dtg
dt_init[,"time" := as.ITime(dtg)]

# Set group time label (for plotting)
dt_init$group <- cut(dt_init$time,
                breaks=c(0,6*3600,9*3600,12*3600,
                         15*3600,18*3600,21*3600,24*3600),
                labels=c("0000 - 0600","0600 - 0900","0900 - 1200",
                         "1200 - 1500","1500 - 1800","1800 - 2100",
                         "2100 - 2400"))

save(dt_init,file="products/dt_init.RData")

Collection Analysis

Measure the degree of temporal collection bias.
- Month and year
- Weekday and hour
- Time between observations
Findings:
- There is significant evidence of collection of month and year bias
- Not significant evidence of weekday or time-of-day bias
- Most observations occur every 10 minutes

Volume by Date

Next we check the collection volume over the time range of the data to judge the collection bias with respect to each day of the study.

Figure 2: Collection Volume over Time

Take aways:

Three spikes in collection volume in April 2012, March 2013, and January to March 2014
No observations between May 2012 and March 2013, which can drastically skew the interobservation time distribution
Inferred loiter locations are likely year/season dependent

Volume by Weekday and Hour

We manipulate the data and generate a bivariate heatmap to study the the collection volume on every weekday, for each hour of the day.

<b>Figure 1: </b>Collection Volume by Hour and Weekday

Figure 1: Collection Volume by Hour and Weekday

Take aways:

Generally uniform across all weekdays and time of days
Further analysis required to determine the statistical significance of how far this data departs from a bivariate uniform distribution

Inter-observation Time

The wolf’s activity between his observations is unknown, which can skew our results. We need to study the time between recorded observations to judge how well the data represents the wolf’s locations.

Figure 3: Inter-observation Time Histogram

Take aways:

95% of the inter-observation times are less than 15 minutes
- loiter locations with a dwell time of one hour may have as few as four observations

Spatial-Temporal Clustering

Analytic method:
- Identify spacial clusters with HDBSCAN
- Identify temporal clusters within each spatial cluster
- Process and disseminate results

Optimal Parameter Values

Allow data to inform spatial and temporal \(cluster\_selection\_epsilon\) parameters for HDBSCAN
- The minimum distance considered in evaluating cluster densities
- Allows us to group together clusters within this distance (Campello et al. 2015)
- This parameter prevents the algorithm from incorrectly labeling cluster boundary points as outliers
- More information about this parameter is available here
Find the distance that separates low from high distances
- Compute the distance to each observation’s nearest neighbor
- Sorting these distances
- Find the distance where the increase in nearest neighbor distance is the most dramatic [rahmah_determination_2016]
- More details about this method are discussed here
- Tutorial for how to implement this method in Python is available here

Find kneedle function

We define the Python function below to calculate the max curvature of a line plot. We use it by setting the x-axis as the order vector and the y-axis as the distances vector, which produces a convex increasing curve. The y-value for the knee in the curve is the optimal parameter value for \(cluster\_selection\_epsilon\).

from kneed import KneeLocator

def findKnee(order,distances,smoothParam,interpMethod='polynomial'):
    kneedle = KneeLocator(order,
                          distances,
                          S=smoothParam,
                          interp_method=interpMethod,
                          curve='convex',
                          direction='increasing',
                          online=True)
    return kneedle.knee_y

Epsilon Spatial Distance

As described here, we first calculate the spatial distance to the nearest neighbor for each observation and sort these values. We then find the maximum curvature, and plot the results.

Figure 4: Sorted Nearest Neighbor Spatial Distance (meters)

Take aways:

~200 meters is the distance just before the nearest neighbor distances rapidly increase
Store this value as epsSpat as a HDBSCAN parameter

Epsilon Temporal Distance

We now apply the same method to get the optimal temporal epsilon distance.

Figure 5: Sorted Nearest Neighbor Temporal Distance (seconds)

Take aways:

Optimal temporal epsilon distance is 12 seconds
Even though the inter-observation times are mostly between 10 to 15 minutes, the differences in the time of day between observations are much closer. We store this value as epsTime for use later

Application

Define python function to implement HDBSCAN, using Python’s sklearn HDBSCAN algorithm (McInnes 2020)
Spatial cluster using the haversine distance
Temporal cluster with a pre-computed time distance matrix

Python Function

# Imports
import numpy as np
import pandas as pd
import hdbscan

# Functions
def runHDB(points,minClusterSize,epsDist,minSampleSize=None,distMatrix=None,verbose=False):

    # Organize parameters
    if minSampleSize is None:
        minSampleSize = minClusterSize # restore default

    # Define cluster object
    if distMatrix is not None:
        clusterFinder = hdbscan.HDBSCAN(min_cluster_size=int(minClusterSize),
                                        min_samples=int(minSampleSize),
                                        metric="precomputed",
                                        cluster_selection_epsilon=epsDist,
                                        cluster_selection_method="eom")
        X = distMatrix

    else:
        clusterFinder = hdbscan.HDBSCAN(min_cluster_size=int(minClusterSize),
                                        min_samples=int(minSampleSize),
                                        metric="haversine",
                                        cluster_selection_epsilon=(epsDist/1000)/6371,
                                        cluster_selection_method="eom")
        X = np.radians(points)
    if verbose:
        print("Running HDBSCAN on {} observations".format(len(X)))

    res = clusterFinder.fit(X)
    y = res.labels_
    if verbose:
        print("Found {} clusters".format(pd.Series(y).max() + 1))

    return y + 1

Cluster Workflow

Apply HDBSCAN with the data-informed parameters for \(cluster\_selection\_epsilon\) to find spatial clusters
Apply HDBSCAN again on each spatial clusters to find space-time clusters
Use the minimum allowable cluster size of 2 to minimize how “smooth” the cluster density estimate is between nearby points.
- Mitigates HDBSCAN’s tendency to identify overlapping clusters (“Outlier Detection - Help to Understand the Right Parameter Configuration. · Issue #116 · Scikit-Learn-Contrib/Hdbscan. GitHub” n.d.)
Use a high minimum sample size to discriminate noisy points from cluster boundary points
This method for using minimum cluster size and minimum sample size is discussed in greater detail here (“Outlier Detection - Help to Understand the Right Parameter Configuration. · Issue #116 · Scikit-Learn-Contrib/Hdbscan. GitHub” n.d.)

# Cluster spatially
Y <- runHDB(points=dt[,c("lon","lat")],
            minClusterSize=2,
            minSampleSize=100,
            epsDist=epsSpat)
dt[,"spatClus" := Y]

# Cluster temporally
tempClus <- rep(0,m)
for (i in unique(dt[spatClus != 0, spatClus])) {
  
  Y <- runHDB(points=NULL,
              distMatrix=timeDistMat[dt$spatClus==i,dt$spatClus==i],
              minClusterSize=2,
              minSampleSize=100,
              epsDist=epsTime)
  
  # label
  tempClus[dt$spatClus==i] <- ifelse(Y!=0,paste0(i,".",Y),0)
 
}

# Set cluster column
dt[,"clus" := tempClus]
save(dt,file="products/dt.RData")

Take aways:

Some clusters only include two observations, others have above 100 observations
- This is a result of HDBSCAN identifying clusters of varying densities
Next steps:
- Screen out clusters that do not include consecutive observations

Screening

Define the following R function to find the indices of the consecutive observations (values), and the lengths of each segment of consecutive observations (lengths)
We use a modified version of the R function available here (“Cran/Cgwtools. GitHub” n.d.)

# Function to find consecutive integers, and the length of each consecutive instance
seqle <- function(x,incr=1) { 
  if(!is.numeric(x)) x <- as.numeric(x) 
  n <- length(x)  
  y <- x[-1L] != x[-n] + incr 
  i <- c(which(y|is.na(y)),n) 
  temp <- list(lengths = diff(c(0L,i)),
               values = x[head(c(0L,i)+1L,-1L)]) 
  return(list(lengths=temp$lengths[temp$lengths > 1] - 1,
              values=temp$values[temp$lengths > 1]))
} 

# Quick look
seqle(which(dt$clus==1.1)) %>% as.data.table() %>% formattable(align="l") %>% as.htmlwidget()

Take aways:

Table above is the result of our seqle function on one cluster
Values column indicates the index of the start of a consecutive segment of observations
Corresponding lengths column indicates the number of consecutive observations that begin at the value index
Cluster number 1.1 there are consecutive segments with over 70 observations
Next steps:
- Apply the seqle function to each cluster
- Filter out the segments consisting of only one observation (no evidence of loitering)
- Calculate the loiter time for each consecutive segment in each cluster
- Filter out the clusters where the wolf visited less than 10 times
- Filter out clusters where the wolf spent an average of less than 30 minutes
- Full code available here

Now that we have stored the results in our data and a list object, we can use this information to plot using leaflet. The functions available here provide a comprehensive method to convey the results of our analysis. We omit the functions in this tutorial for brevity.

Results

The map below conveys the results of clustering spatially and temporally using HDBSCAN. The loiter information for each cluster is shown when hovering over each cluster, and a radial heatmap of time of day by weekday is shown when clicking each cluster. We group the clusters by the time of day and include interactive toggles to allow the analyst to study the wolf’s pattern of life.

Conclusion

This tutorial demonstrates a method to extract actionable information from GPS data.
A heatmap, or hexagon bin density map can provide initial intuition about the subject’s most frequented locations, but more granular information is typically necessary.
With spatial and temporal clustering, we can elicit reliable information about when, where, and for how long the subject (here, the wolf of interest) loiters.

Works Cited

Adolfsson, A., M. Ackerman, and N. C. Brownstein. 2019. “To Cluster, or Not to Cluster: An Analysis of Clusterability Methods.” Pattern Recognition 88 (April): 13–26. https://doi.org/10.1016/j.patcog.2018.10.026.

Appelhans, Tim, and Florian Detsch. 2019. “Leafpop: Include Tables, Images and Graphs in Leaflet Pop-Ups.” https://CRAN.R-project.org/package=leafpop.

Appelhans, Tim, Christoph Reudenbach, Kenton Russell, Jochen Darley, Daniel Montague (Leaflet EasyButton plugin), Lorenzo Busetto, Luigi Ranghetti, Miles McBain, and Sebastian Gatscha. 2020. “Leafem: ’Leaflet’ Extensions for ’Mapview’.” https://CRAN.R-project.org/package=leafem.

Arvai, Kevin. 2020a. Kneed: Knee-Point Detection in Python (version 0.6.0). https://github.com/arvkevi/kneed.

———. 2020b. Arvkevi/Kneed. https://github.com/arvkevi/kneed.

Ashbrook, D., and T. Starner. 2002. “Learning Significant Locations and Predicting User Movement with GPS.” In Proceedings. Sixth International Symposium on Wearable Computers, 101–8. Seattle, WA, USA: IEEE. https://doi.org/10.1109/ISWC.2002.1167224.

“Basic Usage of HDBSCAN* for Clustering — Hdbscan 0.8.1 Documentation.” n.d. Accessed April 30, 2020. https://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html.

Bengtsson, Henrik. 2019. “R.utils: Various Programming Utilities.” https://CRAN.R-project.org/package=R.utils.

Birant, Derya, and Alp Kut. 2007. “ST-DBSCAN: An Algorithm for Clustering Spatial–Temporal Data.” Data & Knowledge Engineering, Intelligent data mining, 60 (1): 208–21. https://doi.org/10.1016/j.datak.2006.01.013.

Bivand, Roger, Tim Keitt, Barry Rowlingson, Edzer Pebesma, Michael Sumner, Robert Hijmans, Even Rouault, Frank Warmerdam, Jeroen Ooms, and Colin Rundel. 2019. “Rgdal: Bindings for the ’Geospatial’ Data Abstraction Library.” https://CRAN.R-project.org/package=rgdal.

Bivand, Roger, Colin Rundel, Edzer Pebesma, Rainer Stuetz, Karl Ove Hufthammer, Patrick Giraudoux, Martin Davis, and Sandro Santilli. 2019. “Rgeos: Interface to Geometry Engine - Open Source (’GEOS’).” https://CRAN.R-project.org/package=rgeos.

Boutin, Stan, Holger Bohm, Eric Neilson, Amanda Droghini, and Corey Mare. 2015. Wildlife Habitat Effectiveness and Connectivity Final Report August 2015. https://doi.org/10.13140/RG.2.2.35281.38240.

Campello, Ricardo J. G. B., Davoud Moulavi, Arthur Zimek, and Jörg Sander. 2015. “Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection.” TKDD. https://doi.org/10.1145/2733381.

Chen, Wen, Minhe Ji, and Jianmei Wang. 2014. “T-DBSCAN: A Spatiotemporal Density Clustering for GPS Trajectory Segmentation.” Int. J. Onl. Eng. 10 (6): 19. https://doi.org/10.3991/ijoe.v10i6.3881.

Cheng, Joe, Bhaskar Karambelkar, Yihui Xie, Hadley Wickham, Kenton Russell, Kent Johnson, Barret Schloerke, et al. 2019. “Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library.” https://CRAN.R-project.org/package=leaflet.

Chirico, Michael, and Dmitry Shkolnik. 2019. “geohashTools: Tools for Working with Geohashes.” https://CRAN.R-project.org/package=geohashTools.

“Combining HDBSCAN* with DBSCAN — Hdbscan 0.8.1 Documentation.” n.d. Accessed April 30, 2020. https://hdbscan.readthedocs.io/en/latest/how_to_use_epsilon.html.

“Cran/Cgwtools. GitHub.” n.d. Accessed April 30, 2020. https://github.com/cran/cgwtools.

Dowle, Matt, Arun Srinivasan, Jan Gorecki, Michael Chirico, Pasha Stetsenko, Tom Short, Steve Lianoglou, et al. 2019. “Data.table: Extension of ’Data.frame’.” https://CRAN.R-project.org/package=data.table.

Droghini, Amanda, and Stan Boutin. 2018. “The Calm During the Storm: Snowfall Events Decrease the Movement Rates of Grey Wolves (Canis Lupus).” PLOS ONE 13 (10): e0205742. https://doi.org/10.1371/journal.pone.0205742.

Emerson, John W, and Michael J Kane. n.d. “The R Package Bigmemory: Supporting Efficient Computation and Concurrent Programming with Large Data Sets.” Journal of Statistical Software, 16.

Ester, Martin, Hans-Peter Kriegel, and Xiaowei Xu. n.d. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” 6.

Han, Jiawei, Micheline Kamber, and Jian Pei, eds. 2012. “Front Matter.” In Data Mining (Third Edition), i–v. The Morgan Kaufmann Series in Data Management Systems. Boston: Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-381479-1.00016-2.

Hariharan, Ramaswamy, and Kentaro Toyama. 2004. “Project Lachesis: Parsing and Modeling Location Histories.” In Geographic Information Science, edited by Max J. Egenhofer, Christian Freksa, and Harvey J. Miller, 3234:106–24. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-30231-5_8.

Hijmans, Robert J., Ed Williams, and Chris Vennes. 2019. “Geosphere: Spherical Trigonometry.” https://CRAN.R-project.org/package=geosphere.

Hopkins, Brian, and J. G. Skellam. 1954. “A New Method for Determining the Type of Distribution of Plant Individuals.” Annals of Botany 18 (70): 213–27. https://www.jstor.org/stable/42907238.

Karambelkar, Bhaskar, Barret Schloerke, Bangyou Zheng (Leaflet-search and Leaflet-GPS plugin integration), Robin Cura (Fixes for Draw Options), Markus Voge (Enhancements for Draw Options), Markus Dumke (Bounce Marker addition), Mapbox (leaflet-omnivore, et al. 2018. “Leaflet.extras: Extra Functionality for ’Leaflet’ Package.” https://CRAN.R-project.org/package=leaflet.extras.

Lawson, Richard G., and Peter C. Jurs. 1990. “New Index for Clustering Tendency and Its Application to Chemical Problems.” J. Chem. Inf. Comput. Sci. 30 (1): 36–41. https://doi.org/10.1021/ci00065a010.

Liu, Juhong, Ouri Wolfson, and Huabei Yin. 2006. “Extracting Semantic Location from Outdoor Positioning Systems.” In Proceedings of the 7th International Conference on Mobile Data Management (MDM’06). IEEE Computer Society.

Maklin, Cory. 2019. “DBSCAN Python Example: The Optimal Value for Epsilon (EPS). Medium.” July 14, 2019. https://towardsdatascience.com/machine-learning-clustering-dbscan-determine-the-optimal-value-for-epsilon-eps-python-example-3100091cfbc.

McInnes, Leland. 2020. Hdbscan: Clustering Based on Density with Variable Density Clusters (version 0.8.26). http://github.com/scikit-learn-contrib/hdbscan.

“Movebank.” n.d. Accessed April 30, 2020. https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study492444603.

Neilson, Eric W., and Stan Boutin. 2017. “Human Disturbance Alters the Predation Rate of Moose in the Athabasca Oil Sands.” Ecosphere 8 (8): e01913. https://doi.org/10.1002/ecs2.1913.

“NumPy — NumPy.” 2020. May 4, 2020. https://numpy.org/.

“Outlier Detection - Help to Understand the Right Parameter Configuration. · Issue #116 · Scikit-Learn-Contrib/Hdbscan. GitHub.” n.d. Accessed April 30, 2020. https://github.com/scikit-learn-contrib/hdbscan/issues/116.

“Pandas - Python Data Analysis Library.” 2020. March 18, 2020. https://pandas.pydata.org/.

Pebesma, Edzer, Roger Bivand, Barry Rowlingson, Virgilio Gomez-Rubio, Robert Hijmans, Michael Sumner, Don MacQueen, Jim Lemon, Josh O’Brien, and Joseph O’Rourke. 2020. “Sp: Classes and Methods for Spatial Data.” https://CRAN.R-project.org/package=sp.

Rahmah, Nadia, and Imas Sukaesih Sitanggang. 2016. “Determination of Optimal Epsilon (Eps) Value on DBSCAN Algorithm to Clustering Data on Peatland Hotspots in Sumatra.” IOP Conf. Ser.: Earth Environ. Sci. 31 (January): 012012. https://doi.org/10.1088/1755-1315/31/1/012012.

Ren, Kun, and Kenton Russell. 2016. “Formattable: Create ’Formattable’ Data Structures.” https://CRAN.R-project.org/package=formattable.

RStudio, and Inc. 2019. “Htmltools: Tools for HTML.” https://CRAN.R-project.org/package=htmltools.

Satopaa, Ville, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. “Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior.” In 2011 31st International Conference on Distributed Computing Systems Workshops, 166–71. Minneapolis, MN, USA: IEEE. https://doi.org/10.1109/ICDCSW.2011.20.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, Pedro Despouy, and Plotly Technologies Inc. 2020. “Plotly: Create Interactive Web Graphics via ’Plotly.js’.” https://CRAN.R-project.org/package=plotly.

Spinu, Vitalie, Garrett Grolemund, Hadley Wickham, Ian Lyttle, Imanuel Constigan, Jason Law, Doug Mitarotonda, Joseph Larmarange, Jonathan Boiser, and Chel Hee Lee. 2020. “Lubridate: Make Dealing with Dates a Little Easier.” https://CRAN.R-project.org/package=lubridate.

Ushey, Kevin, J. J. Allaire, RStudio, Yuan Tang [aut, cph, Dirk Eddelbuettel, Bryan Lewis, et al. 2020. “Reticulate: Interface to ’Python’.” https://CRAN.R-project.org/package=reticulate.

Vaidyanathan, Ramnath, Yihui Xie, J. J. Allaire, Joe Cheng, Kenton Russell, and RStudio. 2019. “Htmlwidgets: HTML Widgets for R.” https://CRAN.R-project.org/package=htmlwidgets.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and RStudio. 2020. “Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics.” https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and RStudio. 2020. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.

Xie, Yihui, Joe Cheng, Xianying Tan, J. J. Allaire, Maximilian Girlich, Greg Freedman Ellis, Johannes Rauh, et al. 2020. “DT: A Wrapper of the JavaScript Library ’DataTables’.” https://CRAN.R-project.org/package=DT.

Loiter Inference with GPS Data

MAJ Gabe Samudio

14 September 2021

Introduction

Case Study Data

Extraction

Transformation

Load

Collection Analysis

Volume by Date

Take aways:

Volume by Weekday and Hour

Take aways:

Inter-observation Time

Take aways:

Spatial-Temporal Clustering

Optimal Parameter Values

Find kneedle function

Epsilon Spatial Distance

Take aways:

Epsilon Temporal Distance

Take aways:

Application

Python Function

Cluster Workflow

Take aways:

Screening

Take aways:

Results

Conclusion

Works Cited