A Appendix
A.1 Git
This appendix provides a concise reference to essential Git concepts and commands, tailored for data analysts and researchers managing code and collaboration. For extended learning, explore the following resources:
- Git Cheat Sheet (PDF)
- Git Cheat Sheets in Other Languages
- Interactive Git Tutorial
- Visual Git Cheat Sheet
- Happy Git with R (for R Users)
A.1.1 Basic Setup
Configure your Git environment using the git config command:
-
Set your name and email (used in commits):
-
Set your preferred text editor (e.g., for writing commit messages):
A.1.2 Creating a Repository
To create a new Git repository in your project directory:
This creates a .git directory where Git stores all version control information.
A.1.3 Tracking Changes
Git tracks changes through a three-tier structure:
- Working Directory: your local folder with files.
- Staging Area: where you prepare changes before committing.
- Local Repository: stores committed snapshots of your code.
Common commands:
-
Check status:
-
Add files to the staging area:
-
Commit staged changes:
A.1.4 Viewing History and Changes
-
Show changes not yet staged:
-
Show committed changes:
-
Restore previous versions of files:
A.1.5 Ignoring Files
To prevent certain files from being tracked by Git, create a .gitignore file. For example:
-
View contents using:
A.1.6 Remote Repositories
Git supports linking local and remote repositories (e.g., GitHub):
-
Add a remote:
-
Push changes to remote:
-
Pull changes from remote:
A.1.7 Collaboration
-
Clone a remote repository:
This creates a local copy and sets up a remote named
origin.
A.1.8 Branching and Merging
-
Create and switch to a new branch:
-
Switch back to main branch:
-
Merge another branch into the current one:
A.1.9 Handling Conflicts
Merge conflicts occur when multiple changes affect the same lines of a file. Git will:
- Mark the conflict in the file.
- Require manual resolution before committing.
Always review and test code after resolving conflicts.
A.1.10 Licensing
Understanding software licensing is essential in open-source collaboration:
- GPL (General Public License): Requires derivative software to also be GPL-licensed.
- Creative Commons: Offers flexible combinations of attribution, sharing, and commercial use restrictions.
Choose licenses aligned with your intended use and contributions.
A.2 Short-cut
These are shortcuts that you probably you remember when working with R. Even though it might take a bit of time to learn and use them as your second nature, but they will save you a lot of time.
Just like learning another language, the more you speak and practice it, the more comfortable you are speaking it.
| function | short-cut |
|---|---|
| navigate folders in console | " " + tab |
| pull up short-cut cheat sheet | ctrl + shift + k |
| go to file/function (everything in your project) | ctrl + . |
| search everything | cmd + shift + f |
| navigate between tabs | Crtl + shift + . |
| type function faster | snip + shift + tab |
| type faster | use tab for fuzzy match |
cmd + up |
|
ctrl + . |
Sometimes you can’t stage a folder because it’s too large. In such case, use Terminal pane in Rstudio then type git add -A to stage all changes then commit and push like usual.
A.3 Function short-cut
apply one function to your data to create a new variable: mutate(mod=map(data,function))
instead of using i in 1:length(object): for (i in seq_along(object))
apply multiple function: map_dbl
apply multiple function to multiple variables:map2autoplot(data) plot times series datamod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data) fit lm model. It could also fit other models (stan, spark, glmnet, keras)
- Sometimes, data-masking will not be able to recognize whether you’re calling from environment or data variables. To bypass this, we use
.data$variableor.env$variable. For exampledata %>% mutate(x=.env$variable/.data$variable
- Problems with data-masking:
- Unexpected masking by data-var: Use
.dataand.envto disambiguate
- Data-var cant get through:
- Tunnel data-var with {{}} + Subset
.datawith [[]]
- Unexpected masking by data-var: Use
- Passing Data-variables through arguments
library("dplyr")
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}- Trouble with selection:
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"
starwars %>% select(all_of((name))) # use all_of() to disambiguate when
averages <- function(data,vars){ # take character vectors with all_of()
data %>%
select(all_of(vars)) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
data %>%
select({{vars}}) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)A.4 Citation
To cite the R packages used during this session, the following code prints BibTeX-formatted citations:
# List all non-base packages loaded in the session
packages <- ls(sessionInfo()$loadedOnly)
# Print BibTeX citations for each package
for (pkg in packages) {
print(toBibtex(citation(pkg)))
}You may wish to redirect this output to a .bib file for integration with LaTeX or R Markdown documents using writeLines().
A.5 Install All Necessary Packages on Your Local Machine
To replicate the environment used in this book or session on another machine, you can follow these steps.
A.5.1 Step 1: Export Installed Packages from Your Current Session
# Get all installed packages
installed <- as.data.frame(installed.packages())
# Preview the installed packages
head(installed)
#> Package LibPath Version Priority
#> abind abind C:/Program Files/R/R-4.4.3/library 1.4-8 <NA>
#> ade4 ade4 C:/Program Files/R/R-4.4.3/library 1.7-23 <NA>
#> ADGofTest ADGofTest C:/Program Files/R/R-4.4.3/library 0.3 <NA>
#> admisc admisc C:/Program Files/R/R-4.4.3/library 0.38 <NA>
#> AER AER C:/Program Files/R/R-4.4.3/library 1.2-14 <NA>
#> afex afex C:/Program Files/R/R-4.4.3/library 1.4-1 <NA>
#> Depends
#> abind R (>= 1.5.0)
#> ade4 R (>= 3.5.0)
#> ADGofTest <NA>
#> admisc R (>= 3.5.0)
#> AER R (>= 3.0.0), car (>= 2.0-19), lmtest, sandwich (>= 2.4-0),\nsurvival (>= 2.37-5), zoo
#> afex R (>= 3.5.0), lme4 (>= 1.1-8)
#> Imports
#> abind methods, utils
#> ade4 graphics, grDevices, methods, stats, utils, MASS, pixmap, sp,\nRcpp
#> ADGofTest <NA>
#> admisc methods
#> AER stats, Formula (>= 0.2-0)
#> afex pbkrtest (>= 0.4-1), lmerTest (>= 3.0-0), car, reshape2,\nstats, methods, utils
#> LinkingTo
#> abind <NA>
#> ade4 Rcpp, RcppArmadillo
#> ADGofTest <NA>
#> admisc <NA>
#> AER <NA>
#> afex <NA>
#> Suggests
#> abind <NA>
#> ade4 ade4TkGUI, adegraphics, adephylo, adespatial, ape, CircStats,\ndeldir, lattice, spdep, splancs, waveslim, progress, foreach,\nparallel, doParallel, iterators, knitr, rmarkdown
#> ADGofTest <NA>
#> admisc QCA (>= 3.7)
#> AER boot, dynlm, effects, fGarch, forecast, foreign, ineq,\nKernSmooth, lattice, longmemo, MASS, mlogit, nlme, nnet, np,\nplm, pscl, quantreg, rgl, ROCR, rugarch, sampleSelection,\nscatterplot3d, strucchange, systemfit (>= 1.1-20), truncreg,\ntseries, urca, vars
#> afex emmeans (>= 1.4), coin, xtable, parallel, plyr, optimx,\nnloptr, knitr, rmarkdown, R.rsp, lattice, latticeExtra,\nmultcomp, testthat, mlmRev, dplyr, tidyr, dfoptim, Matrix,\npsychTools, ggplot2, MEMSS, effects, carData, ggbeeswarm, nlme,\ncowplot, jtools, ggpubr, ggpol, MASS, glmmTMB, brms, rstanarm,\nstatmod, performance (>= 0.7.2), see (>= 0.6.4), ez,\nggResidpanel, grid, vdiffr
#> Enhances License License_is_FOSS License_restricts_use
#> abind <NA> MIT + file LICENSE <NA> <NA>
#> ade4 <NA> GPL (>= 2) <NA> <NA>
#> ADGofTest <NA> GPL <NA> <NA>
#> admisc <NA> GPL (>= 3) <NA> <NA>
#> AER <NA> GPL-2 | GPL-3 <NA> <NA>
#> afex <NA> GPL (>= 2) <NA> <NA>
#> OS_type MD5sum NeedsCompilation Built
#> abind <NA> <NA> no 4.4.1
#> ade4 <NA> <NA> yes 4.4.3
#> ADGofTest <NA> <NA> <NA> 4.4.0
#> admisc <NA> <NA> yes 4.4.3
#> AER <NA> <NA> no 4.4.3
#> afex <NA> <NA> no 4.4.3
# Export the list to a CSV file
write.csv(installed$Package, file = file.path(getwd(), "installed.csv"), row.names = FALSE)A.5.2 Step 2: Install Packages on a New Machine
Once you have transferred the installed.csv file to the new machine, run the following code to install any missing packages.
# Read the list of required packages
required <- read.csv("installed.csv", stringsAsFactors = FALSE)$Package
# Get the list of already installed packages on the current machine
current <- installed.packages()[, "Package"]
# Identify packages that are not yet installed
missing <- setdiff(required, current)
# Install the missing packages
install.packages(missing)⚠️ Note: This approach assumes that all packages are available from CRAN. For packages from GitHub or Bioconductor, use
devtools::install_github()orBiocManager::install()as appropriate.
This approach ensures a reproducible computational environment, which is essential for robust data analysis and collaboration.