A Appendix

A.1 Git

Cheat Sheet

Cheat Sheet in different languages

Learn Git

Interactive Cheat Sheet

Ultimate Guide of Git and GitHub for R user

  • Setting up Git: git config with --global option to configure user name, email, editor, etc.

  • Creating a repository: git init to initialize a repo. Git stores all of its repo data in the .git directory.

  • Tracking changes:

    • git status shows the status of the repo

      • File are stored in the project’s working directory (which users see)

      • The staging area (where the next commit is being built)

      • local repo is where commits are permanently recorded

    • git add put files in the staging area

    • git commit saves the staged content as a new commit in the local repo.

      • git commit -m "your own message" to give a messages for the purpose of your commit.
  • History

    • git diff shows differences between commits

    • git checkout recovers old version of fields

      • git checkout HEAD to go to the last commit

      • git checkout <unique ID of your commit> to go to such commit

  • Ignoring

    • .gitignore file tells Git what files to ignore

    • cat . gitignore *.dat results/ ignore files ending with “dat” and folder “results”.

  • Remotes in GitHub

    • A local git repo can be connected to one or more remote repos.

    • Use the HTTPS protocol to connect to remote repos

    • git push copies changes from a local repo to a remote repo

    • git pull copies changes from a remote repo to a local repo

  • Collaborating

    • git clone copies remote repo to create a local repo with a remote called origin automatically set up
  • Branching

    • git check - b <new-branch-name

    • git checkout master to switch to master branch.

  • Conflicts

    • occur when 2 or more people change the same lines of the same file

    • the version control system does not allow to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.

  • Licensing

    • People who incorporate General Public License (GPL’d) software into their won software must make their software also open under the GPL license; most other open licenses do not require this.

    • The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing and commercialization.

  • Citation:

    • Add a CITATION file to a repo to explain how you want others to cite your work.
  • Hosting

    • Rules regarding intellectual property and storage of sensitive info apply no matter where code and data are hosted.

A.2 Short-cut

These are shortcuts that you probably you remember when working with R. Even though it might take a bit of time to learn and use them as your second nature, but they will save you a lot of time.
Just like learning another language, the more you speak and practice it, the more comfortable you are speaking it.

function short-cut
navigate folders in console " " + tab
pull up short-cut cheat sheet ctrl + shift + k
go to file/function (everything in your project) ctrl + .
search everything cmd + shift + f
navigate between tabs Crtl + shift + .
type function faster snip + shift + tab
type faster use tab for fuzzy match
cmd + up
ctrl + .

Sometimes you can’t stage a folder because it’s too large. In such case, use Terminal pane in Rstudio then type git add -A to stage all changes then commit and push like usual.

A.3 Function short-cut

apply one function to your data to create a new variable: mutate(mod=map(data,function))
instead of using i in 1:length(object): for (i in seq_along(object))
apply multiple function: map_dbl
apply multiple function to multiple variables:map2
autoplot(data) plot times series data
mod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data) fit lm model. It could also fit other models (stan, spark, glmnet, keras)

  • Sometimes, data-masking will not be able to recognize whether you’re calling from environment or data variables. To bypass this, we use .data$variable or .env$variable. For example data %>% mutate(x=.env$variable/.data$variable
  • Problems with data-masking:
    • Unexpected masking by data-var: Use .data and .env to disambiguate
    • Data-var cant get through:
    • Tunnel data-var with {{}} + Subset .data with [[]]
  • Passing Data-variables through arguments
library("dplyr")
mean_by <- function(data,by,var){
    data %>%
        group_by({{{by}}}) %>%
        summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}

mean_by <- function(data,by,var){
    data %>%
        group_by({{{by}}}) %>%
        summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}
  • Trouble with selection:
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"

starwars %>% select(all_of((name))) # use all_of() to disambiguate when 

averages <- function(data,vars){ # take character vectors with all_of()
    data %>%
        select(all_of(vars)) %>%
        map_dbl(mean,na.rm=TRUE)
} 

x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)


# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
    data %>%
        select({{vars}}) %>%
        map_dbl(mean,na.rm=TRUE)
} 

x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)

A.4 Citation

include a citation by [@Farjam_2015]

cite packages used in this session

package=ls(sessionInfo()$loadedOnly) for (i in package){print(toBibtex(citation(i)))}

package=ls(sessionInfo()$loadedOnly) 
for (i in package){
    print(toBibtex(citation(i)))
    }

A.5 Install all necessary packages/libaries on your local machine

Get a list of packages you need to install from this book (or your local device)

installed <- as.data.frame(installed.packages())

head(installed)
#>         Package                            LibPath Version Priority
#> abind     abind C:/Program Files/R/R-4.2.3/library   1.4-5     <NA>
#> ade4       ade4 C:/Program Files/R/R-4.2.3/library  1.7-22     <NA>
#> admisc   admisc C:/Program Files/R/R-4.2.3/library    0.33     <NA>
#> AER         AER C:/Program Files/R/R-4.2.3/library  1.2-10     <NA>
#> afex       afex C:/Program Files/R/R-4.2.3/library   1.3-0     <NA>
#> agridat agridat C:/Program Files/R/R-4.2.3/library    1.21     <NA>
#>                                                                                        Depends
#> abind                                                                             R (>= 1.5.0)
#> ade4                                                                               R (>= 2.10)
#> admisc                                                                            R (>= 3.5.0)
#> AER     R (>= 3.0.0), car (>= 2.0-19), lmtest, sandwich (>= 2.4-0),\nsurvival (>= 2.37-5), zoo
#> afex                                                             R (>= 3.5.0), lme4 (>= 1.1-8)
#> agridat                                                                                   <NA>
#>                                                                                 Imports
#> abind                                                                    methods, utils
#> ade4                graphics, grDevices, methods, stats, utils, MASS, pixmap, sp,\nRcpp
#> admisc                                                                          methods
#> AER                                                           stats, Formula (>= 0.2-0)
#> afex    pbkrtest (>= 0.4-1), lmerTest (>= 3.0-0), car, reshape2,\nstats, methods, utils
#> agridat                                                                            <NA>
#>                   LinkingTo
#> abind                  <NA>
#> ade4    Rcpp, RcppArmadillo
#> admisc                 <NA>
#> AER                    <NA>
#> afex                   <NA>
#> agridat                <NA>
#>                                                                                                                                                                                                                                                                                                                                                                                                Suggests
#> abind                                                                                                                                                                                                                                                                                                                                                                                              <NA>
#> ade4                                                                                                                                                                                                                                                  ade4TkGUI, adegraphics, adephylo, ape, CircStats, deldir,\nlattice, spdep, splancs, waveslim, progress, foreach, parallel,\ndoParallel, iterators
#> admisc                                                                                                                                                                                                                                                                                                                                                                                     QCA (>= 3.7)
#> AER                                                                                                                                  boot, dynlm, effects, fGarch, forecast, foreign, ineq,\nKernSmooth, lattice, longmemo, MASS, mlogit, nlme, nnet, np,\nplm, pscl, quantreg, rgl, ROCR, rugarch, sampleSelection,\nscatterplot3d, strucchange, systemfit (>= 1.1-20), truncreg,\ntseries, urca, vars
#> afex    emmeans (>= 1.4), coin, xtable, parallel, plyr, optimx,\nnloptr, knitr, rmarkdown, R.rsp, lattice, latticeExtra,\nmultcomp, testthat, mlmRev, dplyr, tidyr, dfoptim, Matrix,\npsychTools, ggplot2, MEMSS, effects, carData, ggbeeswarm, nlme,\ncowplot, jtools, ggpubr, ggpol, MASS, glmmTMB, brms, rstanarm,\nstatmod, performance (>= 0.7.2), see (>= 0.6.4), ez,\nggResidpanel, grid, vdiffr
#> agridat                    AER, agricolae, betareg, broom, car, coin, corrgram, desplot,\ndplyr, effects, equivalence, emmeans, FrF2, gam, gge, ggplot2,\ngnm, gstat, HH, knitr, lattice, latticeExtra, lme4, lucid,\nmapproj, maps, MASS, MCMCglmm, metafor, mgcv, NADA, nlme,\nnullabor, ordinal, pbkrtest, pls, pscl, reshape2, rgdal,\nrmarkdown, qicharts, qtl, sp, SpATS, survival, vcd, testthat
#>         Enhances       License License_is_FOSS License_restricts_use OS_type
#> abind       <NA>   LGPL (>= 2)            <NA>                  <NA>    <NA>
#> ade4        <NA>    GPL (>= 2)            <NA>                  <NA>    <NA>
#> admisc      <NA>    GPL (>= 3)            <NA>                  <NA>    <NA>
#> AER         <NA> GPL-2 | GPL-3            <NA>                  <NA>    <NA>
#> afex        <NA>    GPL (>= 2)            <NA>                  <NA>    <NA>
#> agridat     <NA>  CC BY-SA 4.0            <NA>                  <NA>    <NA>
#>         MD5sum NeedsCompilation Built
#> abind     <NA>               no 4.2.0
#> ade4      <NA>              yes 4.2.3
#> admisc    <NA>              yes 4.2.3
#> AER       <NA>               no 4.2.3
#> afex      <NA>               no 4.2.3
#> agridat   <NA>               no 4.2.3

write.csv(installed, file.path(getwd(),'installed.csv'))

After having the installed.csv file on your new or local machine, you can just install the list of packages

# import the list of packages
installed <- read.csv('installed.csv')

# get the list of packages that you have on your device
baseR <- as.data.frame(installed.packages())

# install only those that you don't have
install.packages(setdiff(installed, baseR))