A Appendix

A.1 Git

This appendix provides a concise reference to essential Git concepts and commands, tailored for data analysts and researchers managing code and collaboration. For extended learning, explore the following resources:


A.1.1 Basic Setup

Configure your Git environment using the git config command:

  • Set your name and email (used in commits):

    git config --global user.name "Your Name"
    git config --global user.email "your.email@example.com"
  • Set your preferred text editor (e.g., for writing commit messages):

    git config --global core.editor "code --wait"  # VS Code

A.1.2 Creating a Repository

To create a new Git repository in your project directory:

git init

This creates a .git directory where Git stores all version control information.


A.1.3 Tracking Changes

Git tracks changes through a three-tier structure:

  • Working Directory: your local folder with files.
  • Staging Area: where you prepare changes before committing.
  • Local Repository: stores committed snapshots of your code.

Common commands:

  • Check status:

    git status
  • Add files to the staging area:

    git add filename
    git add .  # Add all changes
  • Commit staged changes:

    git commit -m "A brief message describing the change"

A.1.4 Viewing History and Changes

  • Show changes not yet staged:

    git diff
  • Show committed changes:

    git log
  • Restore previous versions of files:

    git checkout HEAD filename  # Restore last committed version
    git checkout <commit-id> filename  # Restore from specific commit

A.1.5 Ignoring Files

To prevent certain files from being tracked by Git, create a .gitignore file. For example:

# .gitignore
*.dat
results/
  • View contents using:

    cat .gitignore

A.1.6 Remote Repositories

Git supports linking local and remote repositories (e.g., GitHub):

  • Add a remote:

    git remote add origin https://github.com/yourname/repo.git
  • Push changes to remote:

    git push origin main  # or 'master' depending on default branch
  • Pull changes from remote:

    git pull origin main

A.1.7 Collaboration

  • Clone a remote repository:

    git clone https://github.com/username/repository.git

    This creates a local copy and sets up a remote named origin.


A.1.8 Branching and Merging

  • Create and switch to a new branch:

    git checkout -b new-branch-name
  • Switch back to main branch:

    git checkout main
  • Merge another branch into the current one:

    git merge feature-branch

A.1.9 Handling Conflicts

Merge conflicts occur when multiple changes affect the same lines of a file. Git will:

  • Mark the conflict in the file.
  • Require manual resolution before committing.

Always review and test code after resolving conflicts.


A.1.10 Licensing

Understanding software licensing is essential in open-source collaboration:

  • GPL (General Public License): Requires derivative software to also be GPL-licensed.
  • Creative Commons: Offers flexible combinations of attribution, sharing, and commercial use restrictions.

Choose licenses aligned with your intended use and contributions.


A.1.11 Citing Repositories

To guide citation practices:

  • Include a CITATION file in your repository.
  • Provide preferred citation formats (e.g., BibTeX, DOI).

This helps others acknowledge your work in academic or professional settings.


A.2 Short-cut

These are shortcuts that you probably you remember when working with R. Even though it might take a bit of time to learn and use them as your second nature, but they will save you a lot of time.
Just like learning another language, the more you speak and practice it, the more comfortable you are speaking it.

function short-cut
navigate folders in console " " + tab
pull up short-cut cheat sheet ctrl + shift + k
go to file/function (everything in your project) ctrl + .
search everything cmd + shift + f
navigate between tabs Crtl + shift + .
type function faster snip + shift + tab
type faster use tab for fuzzy match
cmd + up
ctrl + .

Sometimes you can’t stage a folder because it’s too large. In such case, use Terminal pane in Rstudio then type git add -A to stage all changes then commit and push like usual.

A.3 Function short-cut

apply one function to your data to create a new variable: mutate(mod=map(data,function))
instead of using i in 1:length(object): for (i in seq_along(object))
apply multiple function: map_dbl
apply multiple function to multiple variables:map2
autoplot(data) plot times series data
mod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data) fit lm model. It could also fit other models (stan, spark, glmnet, keras)

  • Sometimes, data-masking will not be able to recognize whether you’re calling from environment or data variables. To bypass this, we use .data$variable or .env$variable. For example data %>% mutate(x=.env$variable/.data$variable
  • Problems with data-masking:
    • Unexpected masking by data-var: Use .data and .env to disambiguate
    • Data-var cant get through:
    • Tunnel data-var with {{}} + Subset .data with [[]]
  • Passing Data-variables through arguments
library("dplyr")
mean_by <- function(data,by,var){
    data %>%
        group_by({{{by}}}) %>%
        summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}

mean_by <- function(data,by,var){
    data %>%
        group_by({{{by}}}) %>%
        summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}
  • Trouble with selection:
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"

starwars %>% select(all_of((name))) # use all_of() to disambiguate when 

averages <- function(data,vars){ # take character vectors with all_of()
    data %>%
        select(all_of(vars)) %>%
        map_dbl(mean,na.rm=TRUE)
} 

x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)


# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
    data %>%
        select({{vars}}) %>%
        map_dbl(mean,na.rm=TRUE)
} 

x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)

A.4 Citation

To cite the R packages used during this session, the following code prints BibTeX-formatted citations:

# List all non-base packages loaded in the session
packages <- ls(sessionInfo()$loadedOnly)

# Print BibTeX citations for each package
for (pkg in packages) {
  print(toBibtex(citation(pkg)))
}

You may wish to redirect this output to a .bib file for integration with LaTeX or R Markdown documents using writeLines().


A.5 Install All Necessary Packages on Your Local Machine

To replicate the environment used in this book or session on another machine, you can follow these steps.

A.5.1 Step 1: Export Installed Packages from Your Current Session

# Get all installed packages
installed <- as.data.frame(installed.packages())

# Preview the installed packages
head(installed)
#>             Package                            LibPath Version Priority
#> abind         abind C:/Program Files/R/R-4.4.3/library   1.4-8     <NA>
#> ade4           ade4 C:/Program Files/R/R-4.4.3/library  1.7-23     <NA>
#> ADGofTest ADGofTest C:/Program Files/R/R-4.4.3/library     0.3     <NA>
#> admisc       admisc C:/Program Files/R/R-4.4.3/library    0.38     <NA>
#> AER             AER C:/Program Files/R/R-4.4.3/library  1.2-14     <NA>
#> afex           afex C:/Program Files/R/R-4.4.3/library   1.4-1     <NA>
#>                                                                                          Depends
#> abind                                                                               R (>= 1.5.0)
#> ade4                                                                                R (>= 3.5.0)
#> ADGofTest                                                                                   <NA>
#> admisc                                                                              R (>= 3.5.0)
#> AER       R (>= 3.0.0), car (>= 2.0-19), lmtest, sandwich (>= 2.4-0),\nsurvival (>= 2.37-5), zoo
#> afex                                                               R (>= 3.5.0), lme4 (>= 1.1-8)
#>                                                                                   Imports
#> abind                                                                      methods, utils
#> ade4                  graphics, grDevices, methods, stats, utils, MASS, pixmap, sp,\nRcpp
#> ADGofTest                                                                            <NA>
#> admisc                                                                            methods
#> AER                                                             stats, Formula (>= 0.2-0)
#> afex      pbkrtest (>= 0.4-1), lmerTest (>= 3.0-0), car, reshape2,\nstats, methods, utils
#>                     LinkingTo
#> abind                    <NA>
#> ade4      Rcpp, RcppArmadillo
#> ADGofTest                <NA>
#> admisc                   <NA>
#> AER                      <NA>
#> afex                     <NA>
#>                                                                                                                                                                                                                                                                                                                                                                                                  Suggests
#> abind                                                                                                                                                                                                                                                                                                                                                                                                <NA>
#> ade4                                                                                                                                                                                                                      ade4TkGUI, adegraphics, adephylo, adespatial, ape, CircStats,\ndeldir, lattice, spdep, splancs, waveslim, progress, foreach,\nparallel, doParallel, iterators, knitr, rmarkdown
#> ADGofTest                                                                                                                                                                                                                                                                                                                                                                                            <NA>
#> admisc                                                                                                                                                                                                                                                                                                                                                                                       QCA (>= 3.7)
#> AER                                                                                                                                    boot, dynlm, effects, fGarch, forecast, foreign, ineq,\nKernSmooth, lattice, longmemo, MASS, mlogit, nlme, nnet, np,\nplm, pscl, quantreg, rgl, ROCR, rugarch, sampleSelection,\nscatterplot3d, strucchange, systemfit (>= 1.1-20), truncreg,\ntseries, urca, vars
#> afex      emmeans (>= 1.4), coin, xtable, parallel, plyr, optimx,\nnloptr, knitr, rmarkdown, R.rsp, lattice, latticeExtra,\nmultcomp, testthat, mlmRev, dplyr, tidyr, dfoptim, Matrix,\npsychTools, ggplot2, MEMSS, effects, carData, ggbeeswarm, nlme,\ncowplot, jtools, ggpubr, ggpol, MASS, glmmTMB, brms, rstanarm,\nstatmod, performance (>= 0.7.2), see (>= 0.6.4), ez,\nggResidpanel, grid, vdiffr
#>           Enhances            License License_is_FOSS License_restricts_use
#> abind         <NA> MIT + file LICENSE            <NA>                  <NA>
#> ade4          <NA>         GPL (>= 2)            <NA>                  <NA>
#> ADGofTest     <NA>                GPL            <NA>                  <NA>
#> admisc        <NA>         GPL (>= 3)            <NA>                  <NA>
#> AER           <NA>      GPL-2 | GPL-3            <NA>                  <NA>
#> afex          <NA>         GPL (>= 2)            <NA>                  <NA>
#>           OS_type MD5sum NeedsCompilation Built
#> abind        <NA>   <NA>               no 4.4.1
#> ade4         <NA>   <NA>              yes 4.4.3
#> ADGofTest    <NA>   <NA>             <NA> 4.4.0
#> admisc       <NA>   <NA>              yes 4.4.3
#> AER          <NA>   <NA>               no 4.4.3
#> afex         <NA>   <NA>               no 4.4.3

# Export the list to a CSV file
write.csv(installed$Package, file = file.path(getwd(), "installed.csv"), row.names = FALSE)

A.5.2 Step 2: Install Packages on a New Machine

Once you have transferred the installed.csv file to the new machine, run the following code to install any missing packages.

# Read the list of required packages
required <- read.csv("installed.csv", stringsAsFactors = FALSE)$Package

# Get the list of already installed packages on the current machine
current <- installed.packages()[, "Package"]

# Identify packages that are not yet installed
missing <- setdiff(required, current)

# Install the missing packages
install.packages(missing)

⚠️ Note: This approach assumes that all packages are available from CRAN. For packages from GitHub or Bioconductor, use devtools::install_github() or BiocManager::install() as appropriate.

This approach ensures a reproducible computational environment, which is essential for robust data analysis and collaboration.