14 Building a Package on GitHub for Data and Code
These are just some notes mostly on building packages, based mostly on a couple of great sources which you should use to learn the process:
R Packages https://r-pkgs.org/ (Wickham and Bryan (n.d.))
- Includes a quick run through in chapter 2 for building a package, with a function
- Chapter 14 “External data” goes over how to build data into the package
- Various chapter references below refer to this source
Happy Git and GitHub for the useR https://happygitwithr.com/ (Bryan (n.d.))
- All about how to get Git and GitHub working
14.1 Git and GitHub
You’ll need to learn about using Git methods to host a package on GitHub (https://github.com), where you’ll also need at least a free account. See Happy Git and GitHub for the useR https://happygitwithr.com/ (Bryan (n.d.)) to learn everything you’ll need for this. Some key takeaways:
The purpose of Git is version control, and https://github.com is a place to host package repositories (repos) that use this. Methods include keeping track of all changes in your package, including all files in the folders.
You’ll need to use GitHub to create an account and connect with collaborators. And you can use it for setting everything up in your package repository.
You’ll need to have the same folders on your local computer, and a method for committing and pushing up changes to GitHub.
- You can use RStudio for most of this, and you’ll be making all of your edits on this anyway. You’ll see any changes showing up in the Git tab, where you can stage, commit, and push them to your repo.
- GitHub Desktop can also do commits, pushes and pulls, and is very handy for updating changes to GitHub resulting from any package editing done in RStudio (or from any other changes to files and folders in the package folder, such as external data added to
extdata
). You might also find GitHub Desktop better for dealing with moving external files and folders manually stored in the extdata folder.
- You can use RStudio for most of this, and you’ll be making all of your edits on this anyway. You’ll see any changes showing up in the Git tab, where you can stage, commit, and push them to your repo.
14.2 Some notes on the RStudio process
See Chapter 2 in R Packages (Wickham and Bryan (n.d.)) for a simple process. Some observations:
You need
devtools
andusethis
packages.You’ll need to specify a folder with
create_package
, and you might want to put it in aGitHub
folder inDocuments
. It doesn’t matter where, but it’s useful to keep GitHub things together. Refer to Chapter 2 https://r-pkgs.org/whole-game.html.This will create some essential files and folders:
DESCRIPTION
: very important for naming and package information, which you’ll be editing carefullyNAMESPACE
: where functions are declared, generated byroxygen2
R
: folder where your functions will go.
Ultimately, other folders and files are created, and there’s a diagram in https://r-pkgs.org/package-structure-state.html that illustrates the structure.
man
: documentation on all of your functions and data setsinst
: various things go in here, but I’ve just used it forextdata
where you can store .csv and other filesdata
: data (as .rda files) that can be called up likeco2
in the base R package.
You might start by following the process in Chapter 2 for creating a simple package containing a simple function, using various
devtools
andusethis
methods. This chapter also describes editing theDESCRIPTION
file, usingroxygen2
to create theman
file, how to document the function, and getting things ready to post to GitHub.On getting things ready for GitHub, it’s very useful to use
check()
from time to time to see if there are any errors in the various setup files, likeDESCRIPTION
, file & folder structure, and documentation.The process can be confusing, but the various RStudio methods and functions provided in
devtools
andusethis
make it at least very easy to do if you follow directions, which are spelled out pretty clearly in the R Packages book. We’ll look at creating another function later in this appendix, following the method provided in R Packages
14.3 Data
For our package, igisci
, we provided data in two ways:
- raw data as CSVs, shapefiles and TIFFs (useful since rasters don’t seem to be supported in rda)
rda
files: normal external data that are ready to use as data frames and simple feature (sf) data
14.3.1 Raw data in extdata
Raw data (e.g. CSVs, shapefiles, and rasters) can be simply stored in the inst/extdata
folder. Just create those folders and put the files there. Make sure to include all the files (like the multiple files that go with a shapefile). Then, to access the data once the data package is installed, the user just needs to use the system.file()
function to provide the path and then use that with the appropriate read function; e.g. for a CSV, something like:
csvPath <- system.file("extdata","TRI/TRI_2017_CA.csv")
TRI_2017_CA <- read_csv(csvPath)
… or by using the igisci::ex()
function, can be written as
TRI_2017_CA <- read.csv(ex("TRI/TRI_2017_CA.csv"))
… which isn’t much different from reading data in your RStudio project folder, so seemed a useful method to include in this book. And adding data to the data package by simply putting them in the extdata
folder is very easy and also works for data files like TIFFs that aren’t supported as rda
files that we’ll look at next.
14.3.2 Binary data as rda
files
For data needed frequently, similar to co2
or other provided data in base R, you’ll want to create binary data stored as rda
files. These files need to be prepared from data in R and go in the data folder. The process is made very easy by using usethis::use_data()
to add data as rda files to the data
folder. These data can be data frames, simple features, and I’m sure other things.
I used a script addData.R
that I put in the inst
folder which built the data from some of the same files included in the extdata folder, with usethis::use_data()
to store it in the data folder. Here’s a simple example with just a csv converted directly, and it takes care of storing the result in the data
folder as an .rda
file:
sierraFeb <- read_csv("inst/extdata/sierra/sierraFeb.csv")
usethis::use_data(sierraFeb)
14.3.2.1 devtools::document()
This creates documentation on the data sets, using the file R/data.R
, which will need to have lines of code similar to the following to document each data set. Note that the name of the data set goes last, in quotes. The formatting of the field names and descriptions is a bit tricky and doesn’t follow normal R rules. As a result, sometimes my field names don’t exactly match the actual field names. Maybe I’ll get around to changing the original field names with rename
. Note that the organization is important, with the title of the data first, a blank line, then a description, etc.:
#' Sierra February climate data
#'
#' Selection from SierraData to only include February data
#'
#' @format A data frame with 82 entries and 7 variables selected and renamed \describe{
#' \item{STATION_NAME}{Station name}
#' \item{COUNTY}{County Name}
#' \item{ELEVATION}{Elevation in meters}
#' \item{LATITUDE}{Latitude in decimal degrees}
#' \item{LONGITUDE}{Longitude in decimal degrees}
#' \item{PRECIPITATION}{February Average Precipitation in mm}
#' \item{TEMPERATURE}{Febrary Average Temperature in degrees C}
#' }
#' @source \url{https://www.ncdc.noaa.gov/}
"sierraFeb"
Once these are on GitHub, a user can simply install the package with devtools::install_github("iGISc/igisci")
– to use the igisci
we created. Then to access the data just like built-in data, the user just needs to load that library with library(igisci)
14.4 Code (functions)
You can build pretty extensive functions in a package, and you’ve been using these from a variety of packages. We’ll just look at the simple example of a function we’ll name ex()
for making it easier to read data from the extdata
folder of the igisci
package. The methods shown here are based on https://r-pkgs.org/ which you should refer to for more thorough documentation.
Here’s the function we’ll create in the package. However, one caveat is that you want to add this after you’ve already installed the package with data, or you’ll get an error.
We want to use it where in our code we wanted to read a file from extdata
but we want cleaner code that doesn’t take up much space in code we’re writing than just reading it from your workspace. There’s a csv file in the sierra
folder of extdata
named Sierra2LassenData.csv
that we could then read with:
## STATION COUNTY ELEVATION LATITUDE LONGITUDE PRECIPITATION TEMPERATURE
## 1 OROVILLE Butte 52 39.52 -121.55 124 10.7
## 2 AUBURN Placer 394 38.91 -121.08 160 9.7
## 3 SONORA Tuolumne 511 37.97 -120.39 148 7.7
## 4 PLACERVILLE El Dorado 564 38.70 -120.82 171 9.2
## 5 COLFAX Placer 725 39.09 -120.95 207 7.3
## 6 NEVADA CITY Nevada 848 39.25 -121.00 268 6.7
## 7 QUINCY Plumas 1042 39.94 -120.95 182 4.0
## 8 YOSEMITE Mariposa 1225 37.75 -119.59 169 5.0
## 9 PORTOLA Plumas 1478 39.81 -120.47 98 0.5
## 10 TRUCKEE Nevada 1775 39.33 -120.17 126 -1.1
## 11 BRIDGEPORT Mono 1972 38.26 -119.23 41 -2.2
## 12 LEE VINING Mono 2072 37.96 -119.12 72 0.4
## 13 BODIE Mono 2551 38.21 -119.01 40 -4.4
… and that doesn’t take up much more code real estate than if you had a "sierra/sierraFebShort.csv"
in your workspace/RStudio project folder. You just need to remember to also include library(igisci)
in your code.
As you can see, this function is already in the igisci package, but how did it get there? Here are the steps for this simple function.
- Use
usethis::use_r()
to create a script to hold your function(s). We’ll just create one function, so we’ll just name it the same name as the function, but you might want to have a set of maybe related functions in your script.
usethis::use_r("ex")
- Add the function code to the
ex.R
script created.
Create the documentation skeleton while editing the script by using Insert Roxygen Skeleton from the RStudio Code menu.
Then fill in the various elements of the documentation (A title, parameters, what it returns and examples) which shows what you want the user to see with
?ex
or?exfiles
, similar to this, and save the script. If you have more than one function, the documentation will go just above the function definition. See https://r-pkgs.org/man.html.
#' Access external data in package
#'
#' @param dta filename to access from extdata, including folders
#'
#' @return path to the file
#' @export
#'
#' @examples
#' read.csv(ex("sierra/sierraStations.csv"))
ex <- function(dta){
system.file("extdata",dta,package="igisci")
}
#' Listing contents of extdata from package
#' @param NA
#' @return tibble
#' @export
#'
#' @examples
#' view(exfiles())
exfiles <- function(){
library(dplyr)
exfilesDF <- tribble(~dir,~file,~path,~type)
for(d in list.dirs(ex(""),recursive=F)){
#print(d)
dsplit <- strsplit(d,"/")
len <- length(dsplit[[1]])
dir <- dsplit[[1]][len]
for(shp in list.files(d,pattern="*.shp$")){
exfilesDF <- bind_rows(exfilesDF,tribble(~dir,~file,~path,~type,dir,shp,paste0("ex(\'",dir,"/",shp,"\')"),"shapefile"))
}
for(tif in list.files(d,pattern="*.tif$")){
exfilesDF <- bind_rows(exfilesDF,tribble(~dir,~file,~path,~type,dir,tif,paste0("ex(\'",dir,"/",tif,"\')"),"TIFF"))
}
for(csv in list.files(d,pattern="*.csv")){
exfilesDF <- bind_rows(exfilesDF,tribble(~dir,~file,~path,~type,dir,csv,paste0("ex(\'",dir,"/",csv,"\')"),"CSV"))
}
for(xls in list.files(d,pattern="*.xls")){
exfilesDF <- bind_rows(exfilesDF,tribble(~dir,~file,~path,~type,dir,xls,paste0("ex(\'",dir,"/",xls,"\')"),"xls"))
}
}
exfilesDF
}
- Use
devtools::document()
to convert the comments to .Rd files.
Then preview the documentation with
?ex
, and maybe use Check in the Build tab.Then when you commit these changes in GitHub, the package will contain the function, ready to use as long as the user loads the library. You should see the function as
ex.Rd
in theman
folder of your package, similar to what you see for the data described above.
This example was pretty simple for this simple code with no dependencies other than what’s in base R, but you’ll probably want to test it thoroughly, and add more documentation to do anything more complicated.