This Quarto book collects my personal notes, trials and exercises of Geographic Data Science With R: Visualizing and Analyzing Environmental Change by Michael Wimberly (Wimberly, 2023a). The book is also available online. The dataset that accompanies the book can be downloaded from figshare.
I have some experience with the R language but none working with geodata. I want to use maps to show how statistical results differ by country. My focus is therefore on geodata, e.g., I have skipped chapters 1-4 because I am already familiar with their content of introducing R. I have added some additional material/experiments as separate annex chapters.
The book has twelve chapters with 267 pages (appendix and bibliography not included.) I skipped Chapter 1 to 4 (83 pages) because I am already familiar with their topics. The percentage is therefore calculated by notes referring to the chapters 5 to 12 (184 pages).
Resource 1 : Resources used for Geographic Datascience with R (GDSWR)
Wimberly, M. C. (2023). Geographic Data Science With R: Visualizing and Analyzing Environmental Change (1st ed.). Chapman & Hall/CRC.
To refer to an R package I have used the format {}, e.g., {glossary}. At the beginning of each file I have listed all packages used in the current chapter, as the next box demonstrates for this preface.
The package name is followed by the reference to the package author and a link to a short profile of the package that are collected in the annex (Appendix E: Packages used).
I am writing this book as a text for others to read because that forces me to become explicit and explain all my learning outcomes more carefully. Please keep in mind that this text is not written by an expert but by a learner.
Text passages with content I am already familiar I have skipped. In this case I have started with chapter 5 because the chapters 1 to 4 is an introduction into using R which I am already comfortable with.
Section of the original text where I needed more in-depth knowledge I have elaborated and added my own comments resulted from my personal research.
All mistakes are my own responsibility
This is my personal learning material and is therefore neither an accurate replication nor an authoritative textbook.
In spite of replicating most of the content this Quarto book may contain many mistakes. All the misapprehensions and errors are of course my own responsibility.
Glossary
I am using the {glossary} package to create links to glossary entries. If you hover with your mouse over the double underlined links it opens an window with the appropriate glossary text. Try this example: geoJSON. Additionally you will find a table of all used glossary entries in a specific chapter at the end of the chapter, immediately in front of the session-info.
I have added many of the glossary entries when I was working through other books either taking the text passage of these books I was reading or via an internet recherche from other resources. I have added the source of glossary entry. Sometimes I have used abbreviation, but I need still to provide a key what this short references mean.
R Code 1 : Load glossary
Listing / Output 1: Install and load the glossary package with the appropriate glossary.yml file
Code
## 1. Install the glossary package:## https://debruine.github.io/glossary/library(glossary)## If you want to use my glossary.yml file:## 1. fork my repo## https://github.com/petzi53/glossary-pb## 2. Download the `glossary.yml` file from## https://github.com/petzi53/glossary-pb/blob/master/glossary.yml)## 3. Store the file on your hard disk## and change the following path accordinglyglossary::glossary_path("../glossary-pb/glossary.yml")
If you fork the repository of my [Geographic Datascience with R] (https://github.com/petzi53/GDSWR), the glossary will not work out of the box. Load down the glossary.yml file from my glossary-pb GitHub repo, store it on your hard disk and change the path in the code chunk Listing / Output 1.
Glossary is my private learning vehicle
What I have said to my personal notes in general is also valid for the glossary. The entries represent the state of my personal knowledge and are neither authoritative nor complete.
Run the following code chunk only once (manually).
R Code 2 : Download Data Files
Code
base::source(file ="R/helper.R")## create data folder (only once, e.g., only in this chapter)baseURL<-here::here()my_create_folder(base::paste0(baseURL, "/data"))url="https://figshare.com/ndownloader/files/39733921"utils::download.file(url, base::paste0(baseURL, "/data/gdswr_data.zip"))utils::unzip(base::paste0(baseURL, "/data/gdswr_data.zip"), exdir =base::paste0(baseURL, "/data"))## delete .zip filefn<-(base::paste0(baseURL, "/data/gdswr_data.zip"))if(file.exists(fn)){file.remove(fn)}
(For this R code chunk is no output available)
One data file exceeds GitHub’s file size
After I tried to commit my changes, I learned that there is a file mesodata_large.csv in “data/Chapter3” with 151.28 MB. This exceeds GitHub’s file size limit of 100.00 MB and can’t, therefore, be stored on GitHub the standard way.
There is the possibility to use Git Large File Storage - https://git-lfs.github.com. However, I decided against this complexity. Actually, I did not need the files of Chapter1 to Chapter4 as I skipped the starting chapters. But instead to delete the file from my data subfolder, I changed the content of .gitignore accordingly so that this large file will not be transferred to GitHub.
Especially the last point (qualifying namespace) is important for my learning. Besides preventing conflicts with functions of identical names from different packages it helps to learn (or remember) which function belongs to which package. I think this justifies the small overhead and helps to make R code chunks self-sufficient. (No previous package loading, or library calls in the setup chunk.) To foster learning the relation between function and package I embrace the package name with curly brakes and format it in bold.
I am using the package name also for the default installation of base R. This wouldn’t be necessary but it helps me to understand where the base R functions come from. What follows is a list of base R packages of the system library included into every installation and attached (opened) by default:
{base}: The R Base Package
{datsets}: The R Datasets Package
{graphics}: The R Graphics Package
{grDevices}: The R Graphics Devices and Support for Colours and Fonts
{methods}: Formal Methods and Classes
{stats}: The R Stats Package
{utils}: The R Utils Package
When it is clear then I will follow the advice from Hadley Wickham:
When you call a function, you typically omit the names of data arguments, because they are used so commonly. If you override the default value of an argument, use the full name (tidyverse style guide).
Native pipe
I am using the native pipe provided with R 4.1.0. It is important to know that there exist small but significant differences using the native pipe instead of the {magrittr} pipe. On of the consequences is that sometimes the code provided from the book with the {magrittr} pipe does not work with the native pipe.
This does not pose a general problem because you can mix both pipes. But where ever possible I have tried to stay with the native pipe.
The differences between native and {magrittr} pipe are complex and I will not go into the details here. But the following reading list provides you wit material to study the discrepancies.
Differences between native pipe versus {magrittr} pipe
Hadley Wickham in the Tidyverse (“Differences between the base R and magrittr pipes”),
Kathie Press (“Replacing the Magrittr Pipe With the Native R Pipe”)
Geek for Geeks (“What are the differences between R’s native pipe |> and the {magrittr} pipe %>%?”)
Code linking does not work together with code annotation. I am therefore using standard comments for line numbering and explaining it in normal numbered lists after the code chunk. This is not optimal but for learning issues it is important to have link to the original documentations of the packages function.
Code snippets
I am not using always the exact code snippets for my replications because I am not only replicating the code to see how it works but also to change the values of parameters to observe their influences.
Function starting with my_ are private function developed by me to facilitate repetitive tasks. To understand what these functions do, inspect the “R/helper.r” file.
There is one exception to the my_ convention: Geospatial data files are special and use a column type not covered with base R. To get data summaries using the skimr::skim() function I had to define a skim function list (sfl) specific for the geometry column.
Skimmers for the sfc data class
For the {sf} data classes in the geometry column are no skimmers available. Possible data types are:
sfc_POINT,
sfc_LINESTRING,
sfc_POLYGON,
sfc_MULTIPOINT,
sfc_MULTILINESTRING,
sfc_MULTIPOLYGON, and
sfc_GEOMETRY.
(sfc stands for “simple feature list column”.)
The geometry column is a combination of one of the above data types with the general sfc type. It is possible to adapt {skimr} for working with user defined data types using skimr::skim_with() or more generally with the skimr::get_skimmers() function to add default values for new data types to the standard skimr::skim() function.
I have applied with skimr::get_skimmers.sfc() a skim function list sfl() to get summary data for the sfc data type of the geometry column. You can inspect the details of this function in the already mentioned “R/helper.R” file.
Resource 3 : Defining sfl for sfc data types
Resources I have drawn upon to learn how to develop skimmers for the geometrycolumn:
Defining sfl’s for a package: General article that explains how to generate and use with user defined data types. sflstands for “skimr function list”. It is a list-like data structure used to define custom summary statistics for specific data types.
skim of {sf} objects: Discussion specific to the data types in the {sf} package.
Glossary table
term
definition
GeoJSON
GeoJSON is an open standard geospatial data interchange format based on JavaScript Object Notation (JSON). It was first published in June 2008 and became a standard specification with the publication of RFC 7946 in August 2016. GeoJSON is used to represent simple geographic features and their non-spatial attributes, such as points, lines, and polygons, and it uses the World Geodetic System 1984 as its geographic coordinate reference system. GeoJSON is widely used in web mapping applications due to its simplicity and compatibility with JavaScript. It supports various geometry types, including Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. GeoJSON files are typically saved with a .geojson extension and have a MIME type of application/geo+json.
---nocite: | @wimberly-2022, @wimberly-2023, @wimberly-2023a, @dyba-2024---# Preface {.unnumbered}::: {.callout-note style="color: blue;"}##### This is work in progress: Finished about 11%:::This Quarto book collects my personal notes, trials and exercises of Geographic Data Science With R: Visualizing and Analyzing Environmental Change by Michael Wimberly [@wimberly-2023a]. The book is also [available online](https://bookdown.org/mcwimberly/gdswr-book/). The dataset that accompanies the book can be downloaded from [figshare](https://figshare.com/articles/dataset/Geographic_Data_Science_with_R/21301212/3).I have some experience with the R language but none working with geodata. I want to use maps to show how statistical results differ by country. My focus is therefore on geodata, e.g., I have skipped chapters 1-4 because I am already familiar with their content of introducing R. I have added some additional material/experiments as separate annex chapters.The book has twelve chapters with 267 pages (appendix and bibliography *not* included.) I skipped Chapter 1 to 4 (83 pages) because I am already familiar with their topics. The percentage is therefore calculated by notes referring to the chapters 5 to 12 (184 pages).::: {.my-resource}::: {.my-resource-header}:::::: {#lem-index-book-resources}: Resources used for Geographic Datascience with R (GDSWR):::::::::::: {.my-resource-container}- Wimberly, M. C. (2023). [Geographic Data Science With R](https://www.amazon.com/-/de/dp/1032347716/): Visualizing and Analyzing Environmental Change (1st ed.). Chapman & Hall/CRC.- Wimberly, M. C. (2023). Geographic Data Science with R: Visualizing and Analyzing Environmental Change. [https://bookdown.org/mcwimberly/gdswr-book/](https://bookdown.org/mcwimberly/gdswr-book)- Wimberly, M. C. (2022). Geographic Data Science with R [Dataset]. figshare. [https://doi.org/10.6084/m9.figshare.21301212.v3](https://doi.org/10.6084/m9.figshare.21301212.v3)- Dyba, K. (2024, June 26). How to load and save vector data in R. R-Spatial. [https://r-spatial.org/r/2024/06/26/sf-load-save.html](https://doi.org/10.6084/m9.figshare.21301212.v3) (additionally used for chapter 5)::::::## Packages used in this chapterTo refer to an R package I have used the format {**<package-name>**}, e.g., {**glossary**}. At the beginning of each file I have listed all packages used in the current chapter, as the next box demonstrates for this preface. The package name is followed by the reference to the package author and a link to a short profile of the package that are collected in the annex (@sec-z-annex).:::::{.my-resource}:::{.my-resource-header}:::::: {#lem-resource-text}: List of packages to install:::::::::::::{.my-resource-container}- {**glossary**}: - Author: [@glossary] - Package Profile: @sec-z-glossary:::::::::## Personal learning materialI am writing this book as a text for others to read because that forces me tobecome explicit and explain all my learning outcomes more carefully.Please keep in mind that this text is not written by an expert but by alearner.Text passages with content I am already familiar I have skipped. In this case I have started with chapter 5 because the chapters 1 to 4 is an introduction into using R which I am already comfortable with. Section of the original text where I needed more in-depth knowledge I haveelaborated and added my own comments resulted from my personal research.::: {.callout-warning}###### All mistakes are my own responsibilityThis is my personal learning material and is thereforeneither an accurate replication nor an authoritative textbook.In spite of replicating most of the content this Quarto bookmay contain many mistakes. All the misapprehensions and errors are ofcourse my own responsibility.:::## Glossary::: {layout="[10, 30]" layout-valign="center"}I am using the {**glossary**} package to create links to glossaryentries. If you hover with your mouse over the double underlined links it opensan window with the appropriate glossary text. Try this example: `r glossary("geoJSON")`. Additionally you will find a table of all used glossary entries in a specific chapter at the end of the chapter, immediately in front of the session-info.:::I have added many of the glossary entries when I was working throughother books either taking the text passage of these books I was readingor via an internet recherche from other resources. I have added thesource of glossary entry. Sometimes I have used abbreviation, but I needstill to provide a key what this short references mean.::: my-r-code::: my-r-code-header::: {#cnj-load-glossary}: Load glossary::::::::: my-r-code-container```{r}#| label: load-glossary#| lst-label: lst-preface-load-glossary#| lst-cap: "Install and load the glossary package with the appropriate glossary.yml file"## 1. Install the glossary package:## https://debruine.github.io/glossary/library(glossary)## If you want to use my glossary.yml file:## 1. fork my repo## https://github.com/petzi53/glossary-pb## 2. Download the `glossary.yml` file from## https://github.com/petzi53/glossary-pb/blob/master/glossary.yml)## 3. Store the file on your hard disk## and change the following path accordinglyglossary::glossary_path("../glossary-pb/glossary.yml")```::::::If you fork the repository of my [Geographic Datascience with R] (https://github.com/petzi53/GDSWR), the glossary will notwork out of the box. Load down the `glossary.yml` file from [myglossary-pb GitHubrepo](https://github.com/petzi53/glossary-pb/blob/master/glossary.yml),store it on your hard disk and change the path in the code chunk@lst-preface-load-glossary.::: {.callout-warning}###### Glossary is my private learning vehicleWhat I have said to my personal notes in general is also valid for the glossary. The entries represent the state of my personal knowledge and are neither authoritative nor complete. :::## R Code and Datasets### Download datasetsThe data files used in each chapter has to be downloaded from [https://doi.org/10.6084/m9.figshare.21301212](https://doi.org/10.6084/m9.figshare.21301212). Run the following code chunk only once (manually).:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-00-download-data-files}: Download Data Files:::::::::::::{.my-r-code-container}```{r}#| label: download-data-files#| results: hold#| code-fold: show#| eval: falsebase::source(file ="R/helper.R")## create data folder (only once, e.g., only in this chapter)baseURL <- here::here()my_create_folder(base::paste0(baseURL, "/data"))url ="https://figshare.com/ndownloader/files/39733921"utils::download.file(url, base::paste0(baseURL, "/data/gdswr_data.zip"))utils::unzip(base::paste0(baseURL, "/data/gdswr_data.zip"), exdir = base::paste0(baseURL, "/data"))## delete .zip filefn <- (base::paste0(baseURL, "/data/gdswr_data.zip"))if (file.exists(fn)) {file.remove(fn)}```<center>(*For this R code chunk is no output available*)</center>:::::::::::: {.callout-warning}###### One data file exceeds GitHub's file sizeAfter I tried to commit my changes, I learned that there is a file `mesodata_large.csv` in "data/Chapter3" with 151.28 MB. This exceeds GitHub's file size limit of 100.00 MB and can't, therefore, be stored on GitHub the standard way. There is the possibility to use Git Large File Storage - [https://git-lfs.github.com](https://git-lfs.github.com). However, I decided against this complexity. Actually, I did not need the files of `Chapter1` to `Chapter4` as I skipped the starting chapters. But instead to delete the file from my data subfolder, I changed the content of `.gitignore` accordingly so that this large file will not be transferred to GitHub. :::### Style guidesGenerally I am using the [Tidyverse StyleGuide](https://style.tidyverse.org/) for code chunks. I am going to use underscore (`_`)or [snake case](https://en.wikipedia.org/wiki/Snake_case) to replacespaces as studies has shown that it is easier to read [@sharif2010].Additionally I will use some [Google style modifications](https://google.github.io/styleguide/Rguide.html) from the tidyverse styleguide:- Start the names of private functions with `my_` (not with a dot, as recommended in the Google style guide).- Don't use `base::attach()`.- No right-hand assignments.- Use explicit returns.- Qualify namespace.### Qualifying namespaceEspecially the last point (qualifying namespace) is important for mylearning. Besides preventing conflicts with functions of identical namesfrom different packages it helps to learn (or remember) which functionbelongs to which package. I think this justifies the small overhead andhelps to make R code chunks self-sufficient. (No previous packageloading, or library calls in the setup chunk.) To foster learning therelation between function and package I embrace the package name withcurly brakes and format it in bold.I am using the package name also for the default installation of base R.This wouldn't be necessary but it helps me to understand where the baseR functions come from. What follows is a list of base R packages of thesystem library included into every installation and attached (opened) bydefault:- {**base**}: The R Base Package- {**datsets**}: The R Datasets Package- {**graphics**}: The R Graphics Package- {**grDevices**}: The R Graphics Devices and Support for Colours and Fonts- {**methods**}: Formal Methods and Classes- {**stats**}: The R Stats Package- {**utils**}: The R Utils PackageWhen it is clear then I will follow the advice from Hadley Wickham:> When you call a function, you typically omit the names of data> arguments, because they are used so commonly. If you override the> default value of an argument, use the full name ([tidyverse style> guide](https://style.tidyverse.org/syntax.html)).### Native pipeI am using the native pipe provided with R 4.1.0. It is important to know that there exist small but significant differences using the native pipe instead of the {**magrittr**} pipe. On of the consequences is that sometimes the code provided from the book with the {**magrittr**} pipe does not work with the native pipe. This does not pose a general problem because you can mix both pipes. But where ever possible I have tried to stay with the native pipe.The differences between native and {**magrittr**} pipe are complex and I will not go into the details here. But the following reading list provides you wit material to study the discrepancies.:::::{.my-resource}:::{.my-resource-header}Differences between native pipe versus {**magrittr**} pipe:::::::{.my-resource-container}- Hadley Wickham in the [Tidyverse ](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) ("Differences between the base R and magrittr pipes"), - [Kathie Press](https://kpress.dev/blog/2022-06-19-replacing-the-magrittr-pipe-with-native-r-pipe/) ("Replacing the Magrittr Pipe With the Native R Pipe")- [Geek for Geeks](https://www.geeksforgeeks.org/what-are-the-differences-between-r-s-native-pipe-and-the-magrittr-pipe/) ("What are the differences between R's native pipe `|>` and the {**magrittr**} pipe `%>%`?")- [Isabella Velásquez](https://ivelasq.rbind.io/blog/understanding-the-r-pipe/) ("Understanding the native R pipe `|>`")- [StackOverflow](https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-native-pipe-and-the-magrittr-pipe) ("What are the differences between R's native pipe `|>` and the {**magrittr**} pipe `%>%`?")- [Yihui Xie](https://yihui.org/en/2022/04/magrittr-native-pipe/) ("Substitute the magrittr Pipe `%>%` with R’s Native Pipe Operator `|>`")- [R Bloggers](https://www.r-bloggers.com/2021/05/the-new-r-pipe/) (The new R pipe)- [Statistik Dresden](https://statistik-dresden.de/r-4-1-0-base-r-pipe/) ("R 4.1.0: Base R Pipe! `|>`") in German.:::::::::### Code linkingCode linking does not work together with code annotation. I am therefore using standard comments for line numbering and explaining it in normal numbered lists after the code chunk. This is not optimal but for learning issues it is important to have link to the original documentations of the packages function.### Code snippetsI am not using always the exact code snippets for my replicationsbecause I am not only replicating the code to see how it works but alsoto change the values of parameters to observe their influences.Function starting with `my_` are private function developed by me to facilitate repetitive tasks. To understand what these functions do, inspect the "R/helper.r" file.There is one exception to the `my_` convention: Geospatial data files are special and use a column type not covered with base R. To get data summaries using the `skimr::skim()` function I had to define a skim function list (`sfl`) specific for the `geometry` column.### Skimmers for the `sfc` data classFor the {**sf**} data classes in the `geometry` column are no skimmers available. Possible data types are: - sfc_POINT, - sfc_LINESTRING, - sfc_POLYGON, - sfc_MULTIPOINT, - sfc_MULTILINESTRING, - sfc_MULTIPOLYGON, and - sfc_GEOMETRY.(`sfc` stands for "simple feature list column".)The `geometry` column is a combination of one of the above data types with the general `sfc` type. It is possible to adapt {**skimr**} for working with user defined data types using `skimr::skim_with()` or more generally with the `skimr::get_skimmers()` function to add default values for new data types to the standard `skimr::skim()` function. I have applied with `skimr::get_skimmers.sfc()` a skim function list `sfl()` to get summary data for the `sfc` data type of the `geometry` column. You can inspect the details of this function in the already mentioned "R/helper.R" file.:::::{.my-resource}:::{.my-resource-header}:::::: {#lem-index-book-resources}: Defining `sfl` for `sfc` data types:::::::::::::{.my-resource-container}Resources I have drawn upon to learn how to develop skimmers for the `geometry`column: - [Defining sfl’s for a package](https://docs.ropensci.org/skimr/articles/extending_skimr.html#defining-sfls-for-a-package): General article that explains how to generate and use with user defined data types. `sfl`stands for "skimr function list". It is a list-like data structure used to define custom summary statistics for specific data types.- [skim of {**sf**} objects](https://github.com/ropensci/skimr/issues/88): Discussion specific to the data types in the {**sf**} package.:::::::::## Glossary table```{r}#| label: glossary-table#| echo: falseglossary_table()```------------------------------------------------------------------------## Session Info {.unnumbered}::: my-r-code::: my-r-code-headerSession Info:::::: my-r-code-container```{r}#| label: session-infosessioninfo::session_info()```::::::