5 Describing Distances
Since we are thinking of bundling the goods, in new hubs anywhere in the country or region, it pays off to exclude all transactions over short distances of, say, 20km or 50km.
Let’s make an histogram of the “beeline distances,” the straight-line distances between two postal codes.
hist(fashlogNum$beeline)
Simple, but effective.
Use ggplot, for fancier graphs.
library(ggplot2)
ggplot(fashlogNum, aes(x=beeline)) +
geom_histogram(binwidth=1, colour="black", fill="white") +
geom_vline(data=fashlogNum, aes(xintercept=20),
linetype="dashed", size=1, colour="red") +
geom_vline(data=fashlogNum, aes(xintercept=50),
linetype="dashed", size=1, colour="red")
Rather than graphs, we can do it in numbers. It makes it easier to see how many rides are below 20km, below 50km, and so on.
One option is to create a new variable. We use the dplyr package. We make a new variable with three categories (<20km; 20-<50km; 50+km).
# install.packages("dplyr")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
$beelineCat <- case_when((fashlogNum$beeline <= 20) ~ 1,
fashlogNum$beeline > 20) & (fashlogNum$beeline <= 50) ~ 2,
(fashlogNum$beeline > 50) ~ 3)
(fashlogNumtable(fashlogNum$beelineCat)
##
## 1 2 3
## 100217 271116 2016108
The table produces numbers. And percentages would be easier.
Below we show (absolute, relative, and cumulative) frequency tables.
The relative and cumulative frequencies are percentages.
= fashlogNum$beeline
distance = c(0,20,50,100,150,200,250);breaks breaks
## [1] 0 20 50 100 150 200 250
= cut(distance, breaks, right=FALSE)
distance.cut = table(distance.cut)
distance.freq = 100*round(distance.freq / nrow(fashlogNum),4)
distance.relfreq = cumsum(distance.relfreq)
distance.cumfreq cbind(distance.freq,distance.relfreq,distance.cumfreq)
## distance.freq distance.relfreq distance.cumfreq
## [0,20) 100217 4.20 4.20
## [20,50) 271116 11.36 15.56
## [50,100) 793051 33.22 48.78
## [100,150) 820510 34.37 83.15
## [150,200) 370765 15.53 98.68
## [200,250) 28280 1.18 99.86