Chapter 4 Merge and Aggregate the data
4.1 Merge the data
Remember the column maxspeed
from the OSM_sf
table ? We are going to combine the DT.Uber
table with the OSM_sf
to compare the Uber speed_kph_mean
with the OSM maxspeed
:
first we select only from the OSM_sf
the fields of interest :
then we set this new table as a data.table
, convert the column osm_id
as numeric (because the osm_way_id
from DT.Uber
is already a numeric value, and finally define the index column :
And now, we merge this DT.max.speed
table with the DT.Uber
:
with this new table we add a new column is.over
that gives 1 if the mean_speed_kph is greater than the maxspeed allowed and 0 otherwise :
Out of sheer curiosity, how often is the speed limit not respected ?
## [1] 24.93969
OK.
Now, we check how many ways have a speed limit defined :
## [1] 99.91157
How does it looks like per maxspeed value :
DT.max.speed[, .(count = .N), by = .(maxspeed)][, percentage := count / sum(count) *
100][order(maxspeed)]
## maxspeed count percentage
## 1: 0 2112 0.088425696
## 2: 10 4350 0.182126789
## 3: 20 236 0.009880902
## 4: 30 349198 14.620301234
## 5: 40 1430 0.059871565
## 6: 50 1846718 77.318808966
## 7: 60 36972 1.547952099
## 8: 70 188 0.007871227
## 9: 80 146229 6.122349009
## 10: 100 972 0.040695917
## 11: 120 41 0.001716597
Fine, there are few ways without speed limit. (less than 0.09%).
4.2 Aggregate the data
Hang on, we are close the end, we still want to aggregate the data per way and hour of the day. For this, we are going to use the group by functionality via the by
keyword, and create new variables :
(take note that we filter the table on the ways that have an OSM maximum speed defined)
DT.max.speed.per.way.hour <-
DT.max.speed[maxspeed > 0][, .(
total_rows = .N,
ratio.over = sum(is.over) / (.N) * 100,
speed_kph_mean = speed_kph_mean <-
mean(speed_kph_mean),
maxspeed = maxspeed <-
mean(maxspeed),
speed_minus_max = speed_kph_mean - maxspeed,
name = first(name)
), by = .(osm_id, hour)]
How does this new aggregated table looks like ? (The empty square brackets are a neat data.table
trick to print it)
## osm_id hour total_rows ratio.over speed_kph_mean maxspeed
## 1: 4045243 0 60 11.66667 47.80583 50
## 2: 4045243 1 60 30.00000 49.01758 50
## 3: 4045243 2 46 30.43478 49.14030 50
## 4: 4045243 3 58 48.27586 49.89152 50
## 5: 4045243 4 40 42.50000 49.50612 50
## ---
## 126416: 699295574 2 3 0.00000 35.91000 50
## 126417: 699295574 3 3 0.00000 37.82367 50
## 126418: 699295574 16 1 0.00000 27.88300 50
## 126419: 699295574 4 1 0.00000 38.22100 50
## 126420: 699531029 8 1 0.00000 21.04500 50
## speed_minus_max name
## 1: -2.1941667 Frankfurter Allee
## 2: -0.9824167 Frankfurter Allee
## 3: -0.8596957 Frankfurter Allee
## 4: -0.1084828 Frankfurter Allee
## 5: -0.4938750 Frankfurter Allee
## ---
## 126416: -14.0900000 An der Wuhlheide
## 126417: -12.1763333 An der Wuhlheide
## 126418: -22.1170000 An der Wuhlheide
## 126419: -11.7790000 An der Wuhlheide
## 126420: -28.9550000 Pasewalker Straße
For each OSM way, we have all hours of the day, the percentage of times the median speed was over the maximum speed allowed, and the delta between both.