A Appendix
A.1 Git
Cheat Sheet in different languages
Ultimate Guide of Git and GitHub for R user
Setting up Git:
git config
with--global
option to configure user name, email, editor, etc.Creating a repository:
git init
to initialize a repo. Git stores all of its repo data in the.git
directory.-
Tracking changes:
-
git status
shows the status of the repoFile are stored in the project’s working directory (which users see)
The staging area (where the next commit is being built)
local repo is where commits are permanently recorded
git add
put files in the staging area-
git commit
saves the staged content as a new commit in the local repo.-
git commit -m "your own message"
to give a messages for the purpose of your commit.
-
-
-
History
git diff
shows differences between commits-
git checkout
recovers old version of fieldsgit checkout HEAD
to go to the last commitgit checkout <unique ID of your commit>
to go to such commit
-
Ignoring
.gitignore
file tells Git what files to ignorecat . gitignore *.dat results/
ignore files ending with “dat” and folder “results”.
-
Remotes in GitHub
A local git repo can be connected to one or more remote repos.
Use the HTTPS protocol to connect to remote repos
git push
copies changes from a local repo to a remote repogit pull
copies changes from a remote repo to a local repo
-
Collaborating
-
git clone
copies remote repo to create a local repo with a remote calledorigin
automatically set up
-
-
Branching
git check - b <new-branch-name
git checkout master
to switch to master branch.
-
Conflicts
occur when 2 or more people change the same lines of the same file
the version control system does not allow to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.
-
Licensing
People who incorporate General Public License (GPL’d) software into their won software must make their software also open under the GPL license; most other open licenses do not require this.
The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing and commercialization.
-
Citation:
- Add a CITATION file to a repo to explain how you want others to cite your work.
-
Hosting
- Rules regarding intellectual property and storage of sensitive info apply no matter where code and data are hosted.
A.2 Short-cut
These are shortcuts that you probably you remember when working with R. Even though it might take a bit of time to learn and use them as your second nature, but they will save you a lot of time.
Just like learning another language, the more you speak and practice it, the more comfortable you are speaking it.
function | short-cut |
---|---|
navigate folders in console | " " + tab |
pull up short-cut cheat sheet | ctrl + shift + k |
go to file/function (everything in your project) | ctrl + . |
search everything | cmd + shift + f |
navigate between tabs | Crtl + shift + . |
type function faster | snip + shift + tab |
type faster | use tab for fuzzy match |
cmd + up |
|
ctrl + . |
Sometimes you can’t stage a folder because it’s too large. In such case, use Terminal
pane in Rstudio then type git add -A
to stage all changes then commit and push like usual.
A.3 Function short-cut
apply one function to your data to create a new variable: mutate(mod=map(data,function))
instead of using i in 1:length(object)
: for (i in seq_along(object))
apply multiple function: map_dbl
apply multiple function to multiple variables:map2
autoplot(data)
plot times series datamod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data)
fit lm model. It could also fit other models (stan, spark, glmnet, keras)
- Sometimes, data-masking will not be able to recognize whether you’re calling from environment or data variables. To bypass this, we use
.data$variable
or.env$variable
. For exampledata %>% mutate(x=.env$variable/.data$variable
- Problems with data-masking:
- Unexpected masking by data-var: Use
.data
and.env
to disambiguate
- Data-var cant get through:
- Tunnel data-var with {{}} + Subset
.data
with [[]]
- Unexpected masking by data-var: Use
- Passing Data-variables through arguments
library("dplyr")
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}
- Trouble with selection:
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"
starwars %>% select(all_of((name))) # use all_of() to disambiguate when
averages <- function(data,vars){ # take character vectors with all_of()
data %>%
select(all_of(vars)) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
data %>%
select({{vars}}) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
A.4 Citation
include a citation by [@Farjam_2015]
cite packages used in this session
package=ls(sessionInfo()$loadedOnly) for (i in package){print(toBibtex(citation(i)))}
package=ls(sessionInfo()$loadedOnly)
for (i in package){
print(toBibtex(citation(i)))
}
A.5 Install all necessary packages/libaries on your local machine
Get a list of packages you need to install from this book (or your local device)
installed <- as.data.frame(installed.packages())
head(installed)
#> Package LibPath Version Priority
#> abind abind C:/Program Files/R/R-4.2.3/library 1.4-5 <NA>
#> ade4 ade4 C:/Program Files/R/R-4.2.3/library 1.7-22 <NA>
#> admisc admisc C:/Program Files/R/R-4.2.3/library 0.33 <NA>
#> AER AER C:/Program Files/R/R-4.2.3/library 1.2-10 <NA>
#> afex afex C:/Program Files/R/R-4.2.3/library 1.3-0 <NA>
#> agridat agridat C:/Program Files/R/R-4.2.3/library 1.21 <NA>
#> Depends
#> abind R (>= 1.5.0)
#> ade4 R (>= 2.10)
#> admisc R (>= 3.5.0)
#> AER R (>= 3.0.0), car (>= 2.0-19), lmtest, sandwich (>= 2.4-0),\nsurvival (>= 2.37-5), zoo
#> afex R (>= 3.5.0), lme4 (>= 1.1-8)
#> agridat <NA>
#> Imports
#> abind methods, utils
#> ade4 graphics, grDevices, methods, stats, utils, MASS, pixmap, sp,\nRcpp
#> admisc methods
#> AER stats, Formula (>= 0.2-0)
#> afex pbkrtest (>= 0.4-1), lmerTest (>= 3.0-0), car, reshape2,\nstats, methods, utils
#> agridat <NA>
#> LinkingTo
#> abind <NA>
#> ade4 Rcpp, RcppArmadillo
#> admisc <NA>
#> AER <NA>
#> afex <NA>
#> agridat <NA>
#> Suggests
#> abind <NA>
#> ade4 ade4TkGUI, adegraphics, adephylo, ape, CircStats, deldir,\nlattice, spdep, splancs, waveslim, progress, foreach, parallel,\ndoParallel, iterators
#> admisc QCA (>= 3.7)
#> AER boot, dynlm, effects, fGarch, forecast, foreign, ineq,\nKernSmooth, lattice, longmemo, MASS, mlogit, nlme, nnet, np,\nplm, pscl, quantreg, rgl, ROCR, rugarch, sampleSelection,\nscatterplot3d, strucchange, systemfit (>= 1.1-20), truncreg,\ntseries, urca, vars
#> afex emmeans (>= 1.4), coin, xtable, parallel, plyr, optimx,\nnloptr, knitr, rmarkdown, R.rsp, lattice, latticeExtra,\nmultcomp, testthat, mlmRev, dplyr, tidyr, dfoptim, Matrix,\npsychTools, ggplot2, MEMSS, effects, carData, ggbeeswarm, nlme,\ncowplot, jtools, ggpubr, ggpol, MASS, glmmTMB, brms, rstanarm,\nstatmod, performance (>= 0.7.2), see (>= 0.6.4), ez,\nggResidpanel, grid, vdiffr
#> agridat AER, agricolae, betareg, broom, car, coin, corrgram, desplot,\ndplyr, effects, equivalence, emmeans, FrF2, gam, gge, ggplot2,\ngnm, gstat, HH, knitr, lattice, latticeExtra, lme4, lucid,\nmapproj, maps, MASS, MCMCglmm, metafor, mgcv, NADA, nlme,\nnullabor, ordinal, pbkrtest, pls, pscl, reshape2, rgdal,\nrmarkdown, qicharts, qtl, sp, SpATS, survival, vcd, testthat
#> Enhances License License_is_FOSS License_restricts_use OS_type
#> abind <NA> LGPL (>= 2) <NA> <NA> <NA>
#> ade4 <NA> GPL (>= 2) <NA> <NA> <NA>
#> admisc <NA> GPL (>= 3) <NA> <NA> <NA>
#> AER <NA> GPL-2 | GPL-3 <NA> <NA> <NA>
#> afex <NA> GPL (>= 2) <NA> <NA> <NA>
#> agridat <NA> CC BY-SA 4.0 <NA> <NA> <NA>
#> MD5sum NeedsCompilation Built
#> abind <NA> no 4.2.0
#> ade4 <NA> yes 4.2.3
#> admisc <NA> yes 4.2.3
#> AER <NA> no 4.2.3
#> afex <NA> no 4.2.3
#> agridat <NA> no 4.2.3
write.csv(installed, file.path(getwd(),'installed.csv'))
After having the installed.csv
file on your new or local machine, you can just install the list of packages
# import the list of packages
installed <- read.csv('installed.csv')
# get the list of packages that you have on your device
baseR <- as.data.frame(installed.packages())
# install only those that you don't have
install.packages(setdiff(installed, baseR))