Chapter 1 Introduction

Note: this book is a work in progress. All source code for this project are available on my GitHub, which is linked in 1.4.

This book serves as a collection of R Markdown files that aims to assist users in learning the practical syntax and usage of R. Mainly, code snippets and workflow aimed at tackling everyday tasks in data science will be covered, including data cleaning, data wrangling, iterations, machine learning with caret, data visualization, and web app design using Shiny. Each broad topic will be split into chapters, though there will be some overlap.

1.1 R syntax in this book

Code chunks will be presented in a typical Markdown format as such, with the code output below:

runif(n = 20, min = 0, max = 100)
##  [1] 61.730378 55.742246 93.733909 73.541553
##  [5] 33.601481 58.174088 98.087900 69.854770
##  [9]  2.479445 33.739732 32.691337 18.190343
## [13] 42.149178 12.052854 81.348673 56.093152
## [17] 49.868675 76.773096 13.215394 57.569796

When using commands outside of base R, the loading of the parent package will be explicitly shown to avoid confusion:

microbenchmark::microbenchmark(runif(n = 20, min = 0, max = 100))
## Unit: microseconds
##                               expr   min    lq
##  runif(n = 20, min = 0, max = 100) 1.417 1.501
##     mean median    uq   max neval
##  1.69272  1.584 1.709 8.125   100

Typically in longer chains of code, I will use %>% from magrittr as a pipe. This is usually standard practice in code using packages from the tidyverse so it’s a good habit to start using it.

Finally, here is the R version I am currently using:

##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          0.5                         
## year           2021                        
## month          03                          
## day            31                          
## svn rev        80133                       
## language       R                           
## version.string R version 4.0.5 (2021-03-31)
## nickname       Shake and Throw

1.2 R packages commonly used in this book

  • tidyverse: a collection of packages for data science, including dplyr, purrr, stringr, forcats, readr, and ggplot.

  • caret: package for implementation of machine learning models, with support for algorithms such as ranger, rpart, xgbTree, and svmLinear.

  • mlbench: package for benchmarks and datasets in machine learning.

  • broom: package for summarizing of model estimates.

  • ggpubr: package for publication-ready data visualizations.

  • Shiny: package for implementation and designing of interactive web apps.

1.3 Installing R packages

R packages found in this book are available on CRAN and thus can be installed simply by running install.packages(). For packages not on CRAN (or if you want to download developmental versions of a package), you can install packages straight from a GitHub repository by running devtools::install_github().

1.4 Code availability

All code used to compile this book as well as the individual markdown files are available on my repository here

1.5 Website hosting

This book is hosted on the shinyapps server, deployed with the R package rsconnect. The markdown files are compiled in this book format using the R package bookdown.