Maps

At the end of the last chapter, we opened the doors to the topic of dynamic visualization with converting our static ggplot2 plots to dynamic plotly plots. We will continue moving in that direction from now on. One thing that fucking fascinated me at the beginning was mapping. I did not understand how it was done and it seemed extremely complicated. I honestly thought that you had to write code to display every single element of a map. Well, you do need to write code to display things, but it is way easier than I thought. I would walk around the office drooling over the people who could do mapping thinking that it was out of reach for me at the moment. Until I reached that topic and saw that it is not as complicated as the stuff that we have covered so far.

Maps are extremely important though. If you decide to continue with the rest of the series, you will see that we will be building a huge chunk of our progress around mapping. Obviously, there are things in maps that are very complex. For those things, you will have to get a dedicated book or a course. But, as usual, whatever we will cover here will be more than enough for your day to day work and will definitely take you to the next level.

0.19 Maps, Different Libraries

There are a few libraries that can draw you a map. I will mention three of them and will show you how to use only one. Only one, but the best one and the only one that you will need. I will explain why here. Just like with plots, maps can be static (just a picture) and dynamic (interactive). They can even be 3D, but we do not need that. You have already seen the limitations of the static plots. My question is then: why the fuck do we even need static maps? Someone who knew what they were talking about could say: they are easier to save and overall lighter than dynamic maps. The first one is bullshit, because if you want a png of a dynamic map, you can just focus on the area of interest and take a snapshot. That inconvenience is nothing for the flexibility that a dynamic map can provide. The argument about the size difference is valid. If you have a heavy (cpu or ram demanding) web application or script, then downgrading to a static map is one of the options for optimizing performance. However, as I mentioned before, first we will learn to get things done no matter what. If you at the point where you need to start optimizing things, then congratulations, you do not need my books anymore. Trust me, at that point, you will be able to use both static and dynamic maps. Here, I will show you the most useful and exciting way to work with maps. If you get interested in the topic, please, go ahead and start looking deeper into it elsewhere.

0.19.1 ggplot2

The library ggplot2 can also draw static maps. For the reasons outlined above, I will not go deep into generating maps using ggplot2. However, I do want to show you an example of how easy it is to create a map and what a static map looks like. Let’s draw an empty map of the US. Before we begin, open a new R file and type:

We will not need to load it though.

First, we need to get the data that ggplot2 will use to draw the map. The ggplot2 library comes with the function map_data(). This function turns on outline of a shapefile into a dataframe where latitudes and longitudes of each point become columns. We can then use that dataframe to plot those points using standard ggplot2 sintax from before.

Getting the data to create a map of USA. Open it to see what the columns that I just talked about look like.

Standard ggplot2() sintax:

Just like with the line and bar plots that we generated in the previous chapter, we are using the ggplot() function, the ‘+’ sign and, then, specifying the ‘geom_…’ command. The difference here is that we used the geom_polygon() to draw a map instead of the geom_line() or geom_bar() to draw plots.

As you can see it is not too complicated. In total, we wrote five lines of code and got a map. There is no doubt, if you master the ggplot2 you can draw some amazing plots and maps with it. Mastering ggplot2 is not our goal here, which is why we are going to move on.

0.19.2 ggmap

The ggmap library is very similar to the ggplot2. The syntax is very similar and the whole process of gradually layering things on top of each other is the same. The output is static as well. The difference is that ggmap allows you to work with basemaps (background). A basemap is like an atlas. When we generated the shape of the United States, there was nothing behind the shape, just a blank canvas. If we did the same using ggmap, we would see the water, states, and many other things depending on the basemap we selected. It is better if I demonstrate. Before we proceed, type:

Than, load the library.

It is pretty much the same as with the ggplot(), even easier. Instead of five lines, we will do it in four.

Here is a nice figure!

Figure 1: Here is a nice figure!

Nice. Not really though. It is the same shitty static map. It can be useful here and there, but for the reasons that I talked about in the beginning of this section, we will not be using ggmap either. Nevertheless, now you are familiar with both main libraries for generating static maps. I, actually, wanted to skip ggmap library demonstration altogether, because Google now requires us to create an account with them in order to use their basemaps. That is a big annoyance, especially when you are just learning and do not really know how to use ‘api keys’ and other shit that they want you to do. When you are learning, you just want shit to work. Adding that extra layer of complexity is just not worth it, considering that, in my opinion, ggmap is not even that great. Let’s move on to what we came here for.

0.19.3 Leaflet

Taken from the leaflet’s docummentation page on GitHub:

“Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB.”

Leaflet is extremely popular. It is a JavaScript library, but it is adopted by R so well that it is actually much easier to use it in R than in JavaScript. I did both. I do not want you to think that I am bullshitting you right now, so let me shock you with how awesome and simple it is right away. Type:

After it is installed, let’s load the library and render our first leaflet.

Just one line this time.

Figure 2: Here is a nice figure!

How cool is this? With just two lines of code you got a fully dynamic map that you can interact with. Go ahead and be amazed of how much better it is compared to that static crap, and how much easier it was to render it. The first time I saw it, I knew that, unless there is a major compromise associated with using it, I will not be going back to static maps. Now that you stopped drooling over that map, let’s talk about leaflet a bit more. After that, I will show you a lot of cool stuff that you can do with it. As always, not too much, but just enough to get you started.

0.20 Leaflet, Deeper Dive

I know that I praised leaflet a lot already, but I am not done yet. Leaflet, along with plotly and shiny (which we will cover later) gave me a huge boost in terms of my drive to learn R and code in general. Going through dry tutorials and hundreds and hundreds of dry lines of code gets very boring at some point. Some people do enjoy coding just for the coding part, others enjoy crunching numbers and solving problems, however, the majority of us want to be entertained from time to time. Crunching numbers and solving problems can be entertaining too, but when you are just starting, in most cases, you need that tangible progress. Not many things in programming are more tangible than making things move on the screen. That is why giving yourself a boost by learning how to inject interactivity in your code will get your further than just sitting and learning dry code. It definitely worked for me.

Before we move to our routine of pulling data from a database, messing with it, and outputting it on a fancy map, I want to go through some leaflet basics. Basically, I want to show you how to place a dot on a map and play with a few parameters. You do not actually have to follow my tutorial on this if you hate me already. Leaflet has an amazing and simple tutorial on their GitHub page, just google ‘leaflet R’. But if you still tolerate me, keep reading.

Let’s see what each part does and what kind of things we can layer on top. First, we will call the leaflet() function without anything else to see if it does anything.

Figure 3: Here is a nice figure!

Apparently, it provides an empty canvas with some basic functionality. The difference between this and the one before is the addTiles() function. So, the addTiles() function must be the one that actually paints the map on top of the canvas. Since we already know that, lets add a few things.

Figure 4: Here is a nice figure!

We are now centered and zoomed in on Boston.

If you are not familiar with coordinates, every point on earth has it set of coordinates that consist of latitude and longitude. The place where you are sitting right now has its own pair as well. The more decimals, the more precise the location is. Test it, find your set of coordinates on google maps and insert them instead of the ones that I gave you, increase the zoom to see your location.

If we leave the addTiles() function empty, it will just give us the default OpenStreetMap. If you want something else (and you do), you should use the function addProviderTiles() to get the third-party maps. There are a lot of them, you can see most of them here: http://leaflet-extras.github.io/leaflet-providers/preview/index.html. Let’s change the basemap to something different. I really like neutral grey colors.

Figure 5: Here is a nice figure!

Go through those maps, experiment with them, find your favorite. One thing you might be confused about from the previous chunk of code is ‘providers$CartoDB.Positron’. The ‘providers’ is the leaflet provided dataframe with the avaliable maps. The ‘CartoDB.Positron’ is one of the maps in that dataframe. Let’s double check.

## $OpenStreetMap
## [1] "OpenStreetMap"
## 
## $OpenStreetMap.Mapnik
## [1] "OpenStreetMap.Mapnik"
## 
## $OpenStreetMap.BlackAndWhite
## [1] "OpenStreetMap.BlackAndWhite"
## 
## $OpenStreetMap.DE
## [1] "OpenStreetMap.DE"
## 
## $OpenStreetMap.CH
## [1] "OpenStreetMap.CH"

Now that we got that down, lets add the last two things to our map. The only thing missing right now is some sort of marker. Let’s add it.

Figure 6: Here is a nice figure!

Just like that we added a marker and a popup. If you did not see the popup, just click on the marker. Obviously, this is far from it for the leaflet, but I think you get the idea. You just layer more and more things and add more bells and whistles. In the next part, we will use real data to create some real shit that will be good enough to even publish online.

0.21 Leaflet with Real Data

In this section, we will, again, pull data from a database, work with it a little bit, and map it in different ways using leaflet maps. This will be a nice and entertaining project for you. We will still repeat the same work of pulling, munging, and outputting, but a much smaller amount compared to the previous chapters. The exciting part will be mapping the data and making everything look pretty. I can almost guarantee that you will love it.

We got an extremely interesting dataset of car crashes provided be the New York City police (NYPD). That dataset has every car crash registered by NYPD since 2012. There are three particular things that we are interested in. They are the number of crashes, the number of injuries, and the number of deaths. First, we will retrieve the data from the database, then work with it a bit to get it to the right shape, and then, using the leaflet flow that we learned, we will map the crashes. Taking it a step further, I will teach you how to map data using polygons from a shapefile. That should solidify your knowledge and interest in the topic.

You can follow this however you want, but I think it is a good time to create a new project and keep everything there. I will show you how to do it, but I am not insisting because I myself did not start using projects until after my first nine months learning R. Now that I do use them, I know that it is the way to go. Let me refresh your memory on projects in R and how they differ from just opening an R file. To put it simply, a project will create a folder for you where you will be storing everything related to that project. These things will include the project file itself, an R file, any csv, excel, shape, plot, map, etc. files that you are using in that project. The major benefit of it is that you do not need to look for all these files all over your computer and you do not need to specify their paths when loading them in, your project will know that they are in the same folder. Additionally, if you ever decide to move your project to another computer, you will just need to drag that project folder over and that is it.

Sage Tip: Use projects, they are very convenient.

Here is how to do it:

Go to ‘File’ -> ‘New Project…’ -> ‘New Directory’ -> ‘New Project’ -> Give it a name and leave the checkboxes empty. Click ‘Create Project’. You should have an empty Rstudio now. Click ‘File’ -> ‘New File’ -> ‘R Script’.

There you go, everything will be contained in that folder now. For the rest of the book, we will be working out of this project folder. It is not really mandatory, as long as you can reference all the files that we will be using. Before we proceed, I want you to install four packages that deal with shapefiles and geolocation. We will not need them all, but you should have them installed just in case. So,

That was a new syntax for you, but this is how you install multiple packages without repeating install.packages() four times. You just feed a vector of package names to the function, that is it.

Lets pull just one month of data and see what it looks like if we map it.

Lets select a day of data from the ‘pdData’ table:

##   row_names       date   zip      lat       lon injured killed
## 1     86987 2019-10-01    NA       NA        NA       2      0
## 2     87151 2019-10-01    NA       NA        NA       0      0
## 3    114890 2019-10-01 10451       NA        NA       1      0
## 4    114937 2019-10-01    NA 40.62616 -74.15742       0      0
## 5    114940 2019-10-01 11214 40.60693 -73.99941       0      0
## 6    114947 2019-10-01 10462 40.85241 -73.86777       0      0
##                           reason
## 1          Following Too Closely
## 2                    Unspecified
## 3                    Unspecified
## 4 Driver Inattention/Distraction
## 5  Failure to Yield Right-of-Way
## 6    Traffic Control Disregarded

There are quite a few crashes that happened that day. Upon inspection of the first few records, we can see that there are missing coordinates. We cannot map data with missing coordinates; therefore, we are not interested in the entries with missing either lat or lon or both. Let’s get rid of them and map the rest. As a reminder, the sign ‘|’ means ‘or’ and the sign ‘&’ means ‘and’ in R and in most programming languages. In terms of the map, only a couple of things will be different compared to our practice map. First, we will center the view on NYC instead of Boston and zoom out a little. Second, the coordinates in the addMarkers() function will not be the actual two coordinates but a column of coordinates for both ‘lng’ and ‘lat’. Finally, the popup will represent a crash reason for each location.

Figure 7: Here is a nice figure!

This is crap, really. You can still use it and some idiots would, but this is not a proper way of mapping things. It is way too crowded and confusing. We only mapped one day, imagine what it would look like if we mapped one month or one year. Not only would it be crowded but also heavy in terms of processing power. Each element on the map takes a little bit of processing power. If you have thousands and thousands of dynamic elements, your computer might just freeze.

Sage Tip: Let’s see what we can do without changing the whole thing completely. The function addMarkers() has an amazing option called ‘clusterOptions = markerClusterOptions()’. This thing will group all markers that are close to each other and instead of displaying them from the beginning, will do that only when you are zoomed in enough to see them properly. Until then, it will just display how many markers are in each cluster. Win-win!

Figure 8: Here is a nice figure!

This is amazing, right? With one parameter, we took an overcrowded fucked up map and made a proper one, while adding some cool animation as well. If somebody showed it to me before I knew how it is done, I would pee my pants thinking how cool that is. Anyway, let’s see if we can get away with scaling our data to a month and a year using the same trick.

We are going to select a whole year this time so we will not have to do it again:

Now that we got the data again, lets map it and see what it looks like.

Figure 9: Here is a nice figure!

This is even better. Lets scale up to a year.