1 Introduction

Since the turn of the century, we have witnessed remarkable advancements and innovations, particularly in statistics, information technology, computer science, and the rapidly emerging field of data science. However, one challenge of these developments is the overuse of buzzwords like big data, machine learning, and deep learning. While these terms are powerful in context, they can sometimes obscure the foundational principles underlying their application.

Every substantive field often has its own specialized metric subfield, such as:

  • Econometrics in economics
  • Psychometrics in psychology
  • Chemometrics in chemistry
  • Sabermetrics in sports analytics
  • Biostatistics in public health and medicine

To the layperson, these disciplines are often grouped under broader terms like:

  • Data Science
  • Applied Statistics
  • Computational Social Science

As exciting as it is to explore these new tools and techniques, I must admit that retaining these concepts can be challenging. For me, the most effective way to internalize and apply these ideas has been to document the data analysis process from start to finish.

With that in mind, let’s dive in and explore the fascinating world of data analysis together.

1.1 General Recommendations

  • The journey of mastering data analysis is fueled by practice and repetition. The more lines of code you write, the more functions you familiarize yourself with, and the more you experiment, the more enjoyable and rewarding this process becomes.

  • Readers can approach this book in several ways:

    • Focused Learning: If you are interested in specific methods or tools, you can jump directly to the relevant section by navigating through the table of contents.
    • Sequential Learning: To follow a traditional path of data analysis, start with the Linear Regression section.
    • Experimental Approach: If you are interested in designing experiments and testing hypotheses, explore the Analysis of Variance (ANOVA) section.
  • For those primarily interested in applications and less concerned with theoretical foundations, focus on the summary and application sections of each chapter.

  • If a concept is unclear, consider researching the topic online. This book serves as a guide, and external resources like tutorials or articles can provide additional insights.

  • To customize the code examples provided in this book, use R’s built-in help functions. For instance:

    • To learn more about a specific function, type help(function_name) or ?function_name in the R console.
    • For example, to find details about the hist function, type ?hist or help(hist) in the console.
  • Additionally, searching online is a powerful resource (e.g., Google, ChatGPT, etc.). Different practitioners often use various R packages to achieve similar results. For instance, if you need to create a histogram in R, a simple search like “histogram in R” will provide multiple approaches and examples.

By adopting these strategies, you can tailor your learning experience and maximize the value of this book.

Tools of statistics

  • Probability Theory
  • Mathematical Analysis
  • Computer Science
  • Numerical Analysis
  • Database Management

Code Replication

This book was built with R version 4.2.3 (2023-03-15 ucrt) and the following packages:

package version source
abind 1.4-5 CRAN (R 4.2.0)
agridat 1.21 CRAN (R 4.2.3)
ape 5.7-1 CRAN (R 4.2.3)
assertthat 0.2.1 CRAN (R 4.2.3)
backports 1.4.1 CRAN (R 4.2.0)
bookdown 0.35 CRAN (R 4.2.3)
boot 1.3-28.1 CRAN (R 4.2.3)
broom 1.0.5 CRAN (R 4.2.3)
bslib 0.6.1 CRAN (R 4.2.3)
cachem 1.0.8 CRAN (R 4.2.3)
callr 3.7.3 CRAN (R 4.2.3)
car 3.1-2 CRAN (R 4.2.3)
carData 3.0-5 CRAN (R 4.2.3)
cellranger 1.1.0 CRAN (R 4.2.3)
cli 3.6.1 CRAN (R 4.2.3)
coda 0.19-4 CRAN (R 4.2.3)
colorspace 2.1-0 CRAN (R 4.2.3)
corpcor 1.6.10 CRAN (R 4.2.0)
crayon 1.5.2 CRAN (R 4.2.3)
cubature 2.1.0 CRAN (R 4.2.3)
curl 5.1.0 CRAN (R 4.2.3)
data.table 1.14.8 CRAN (R 4.2.3)
DBI 1.2.0 CRAN (R 4.2.3)
dbplyr 2.4.0 CRAN (R 4.2.3)
desc 1.4.3 CRAN (R 4.2.3)
devtools 2.4.5 CRAN (R 4.2.3)
digest 0.6.31 CRAN (R 4.2.3)
dplyr 1.1.2 CRAN (R 4.2.3)
ellipsis 0.3.2 CRAN (R 4.2.3)
evaluate 0.23 CRAN (R 4.2.3)
extrafont 0.19 CRAN (R 4.2.2)
extrafontdb 1.0 CRAN (R 4.2.0)
fansi 1.0.4 CRAN (R 4.2.3)
faraway 1.0.8 CRAN (R 4.2.3)
fastmap 1.1.1 CRAN (R 4.2.3)
forcats 1.0.0 CRAN (R 4.2.3)
foreign 0.8-84 CRAN (R 4.2.3)
fs 1.6.3 CRAN (R 4.2.3)
generics 0.1.3 CRAN (R 4.2.3)
ggplot2 3.4.4 CRAN (R 4.2.3)
glue 1.6.2 CRAN (R 4.2.3)
gtable 0.3.4 CRAN (R 4.2.3)
haven 2.5.3 CRAN (R 4.2.3)
Hmisc 5.1-0 CRAN (R 4.2.3)
hms 1.1.3 CRAN (R 4.2.3)
htmltools 0.5.7 CRAN (R 4.2.3)
htmlwidgets 1.6.2 CRAN (R 4.2.3)
httr 1.4.7 CRAN (R 4.2.3)
investr 1.4.2 CRAN (R 4.2.3)
jpeg 0.1-10 CRAN (R 4.2.2)
jquerylib 0.1.4 CRAN (R 4.2.3)
jsonlite 1.8.8 CRAN (R 4.2.3)
kableExtra 1.3.4 CRAN (R 4.2.3)
knitr 1.45 CRAN (R 4.2.3)
lattice 0.21-8 CRAN (R 4.2.3)
latticeExtra 0.6-30 CRAN (R 4.2.3)
lifecycle 1.0.4 CRAN (R 4.2.3)
lme4 1.1-35.1 CRAN (R 4.2.3)
lmerTest 3.1-3 CRAN (R 4.2.3)
lsr 0.5.2 CRAN (R 4.2.3)
ltm 1.2-0 CRAN (R 4.2.3)
lubridate 1.9.2 CRAN (R 4.2.3)
magrittr 2.0.3 CRAN (R 4.2.3)
MASS 7.3-60 CRAN (R 4.2.3)
matlib 0.9.6 CRAN (R 4.2.3)
Matrix 1.6-1 CRAN (R 4.2.3)
MCMCglmm 2.35 CRAN (R 4.2.3)
memoise 2.0.1 CRAN (R 4.2.3)
mgcv 1.9-0 CRAN (R 4.2.3)
minqa 1.2.6 CRAN (R 4.2.3)
modelr 0.1.11 CRAN (R 4.2.3)
munsell 0.5.0 CRAN (R 4.2.3)
nlme 3.1-163 CRAN (R 4.2.3)
nloptr 2.0.3 CRAN (R 4.2.3)
nlstools 2.0-0 CRAN (R 4.2.3)
nnet 7.3-19 CRAN (R 4.2.3)
numDeriv 2016.8-1.1 CRAN (R 4.2.0)
openxlsx 4.2.5.2 CRAN (R 4.2.3)
pbkrtest 0.5.2 CRAN (R 4.2.3)
pillar 1.9.0 CRAN (R 4.2.3)
pkgbuild 1.4.3 CRAN (R 4.2.3)
pkgconfig 2.0.3 CRAN (R 4.2.3)
pkgload 1.3.3 CRAN (R 4.2.3)
png 0.1-8 CRAN (R 4.2.2)
ppsr 0.0.2 CRAN (R 4.2.3)
prettyunits 1.2.0 CRAN (R 4.2.3)
processx 3.8.2 CRAN (R 4.2.3)
ps 1.7.5 CRAN (R 4.2.3)
pscl 1.5.5.1 CRAN (R 4.2.3)
purrr 1.0.2 CRAN (R 4.2.3)
R6 2.5.1 CRAN (R 4.2.3)
RColorBrewer 1.1-3 CRAN (R 4.2.0)
Rcpp 1.0.11 CRAN (R 4.2.3)
readr 2.1.4 CRAN (R 4.2.3)
readxl 1.4.3 CRAN (R 4.2.3)
remotes 2.4.2.1 CRAN (R 4.2.3)
reprex 2.0.2 CRAN (R 4.2.3)
rgl 1.2.1 CRAN (R 4.2.3)
rio 1.0.1 CRAN (R 4.2.3)
rlang 1.1.1 CRAN (R 4.2.3)
RLRsim 3.1-8 CRAN (R 4.2.3)
rmarkdown 2.25 CRAN (R 4.2.3)
rprojroot 2.0.4 CRAN (R 4.2.3)
rstudioapi 0.15.0 CRAN (R 4.2.3)
Rttf2pt1 1.3.12 CRAN (R 4.2.2)
rvest 1.0.3 CRAN (R 4.2.3)
sass 0.4.8 CRAN (R 4.2.3)
scales 1.3.0 CRAN (R 4.2.3)
sessioninfo 1.2.2 CRAN (R 4.2.3)
stringi 1.7.12 CRAN (R 4.2.2)
stringr 1.5.1 CRAN (R 4.2.3)
svglite 2.1.1 CRAN (R 4.2.3)
systemfonts 1.0.5 CRAN (R 4.2.3)
tensorA 0.36.2 CRAN (R 4.2.0)
testthat 3.1.10 CRAN (R 4.2.3)
tibble 3.2.1 CRAN (R 4.2.3)
tidyr 1.3.0 CRAN (R 4.2.3)
tidyselect 1.2.0 CRAN (R 4.2.3)
tidyverse 2.0.0 CRAN (R 4.2.3)
tzdb 0.4.0 CRAN (R 4.2.3)
usethis 2.2.2 CRAN (R 4.2.3)
utf8 1.2.3 CRAN (R 4.2.3)
vctrs 0.6.3 CRAN (R 4.2.3)
viridisLite 0.4.2 CRAN (R 4.2.3)
webshot 0.5.5 CRAN (R 4.2.3)
withr 2.5.2 CRAN (R 4.2.3)
xfun 0.39 CRAN (R 4.2.3)
xml2 1.3.6 CRAN (R 4.2.3)
xtable 1.8-4 CRAN (R 4.2.3)
yaml 2.3.7 CRAN (R 4.2.3)
zip 2.3.0 CRAN (R 4.2.3)
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.3 (2023-03-15 ucrt)
#>  os       Windows 10 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/Los_Angeles
#>  date     2024-02-08
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  bookdown      0.35    2023-08-09 [1] CRAN (R 4.2.3)
#>  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.2.3)
#>  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.2.3)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.3)
#>  codetools     0.2-19  2023-02-01 [1] CRAN (R 4.2.3)
#>  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.2.3)
#>  desc          1.4.3   2023-12-10 [1] CRAN (R 4.2.3)
#>  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.2.3)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.3)
#>  dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.2.3)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.3)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.2.3)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.3)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.3)
#>  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.2.3)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.2.3)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.3)
#>  ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.2.3)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.3)
#>  gtable        0.3.4   2023-08-21 [1] CRAN (R 4.2.3)
#>  highr         0.10    2022-12-22 [1] CRAN (R 4.2.3)
#>  hms           1.1.3   2023-03-21 [1] CRAN (R 4.2.3)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.2.3)
#>  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.2.3)
#>  httpuv        1.6.11  2023-05-11 [1] CRAN (R 4.2.3)
#>  jpeg        * 0.1-10  2022-11-29 [1] CRAN (R 4.2.2)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.3)
#>  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.2.3)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.2.3)
#>  later         1.3.1   2023-05-02 [1] CRAN (R 4.2.3)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.2.3)
#>  lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.2.3)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.3)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.3)
#>  mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
#>  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.3)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.3)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.3)
#>  pkgbuild      1.4.3   2023-12-10 [1] CRAN (R 4.2.3)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.3)
#>  pkgload       1.3.3   2023-09-22 [1] CRAN (R 4.2.3)
#>  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.2.3)
#>  promises      1.2.1   2023-08-10 [1] CRAN (R 4.2.3)
#>  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.2.3)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.3)
#>  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.2.3)
#>  readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.2.3)
#>  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.2.3)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.2.3)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.2.3)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.2.3)
#>  sass          0.4.8   2023-12-06 [1] CRAN (R 4.2.3)
#>  scales      * 1.3.0   2023-11-28 [1] CRAN (R 4.2.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.3)
#>  shiny         1.7.5   2023-08-12 [1] CRAN (R 4.2.3)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
#>  stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.2.3)
#>  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.2.3)
#>  tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.2.3)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.3)
#>  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.2.3)
#>  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.2.3)
#>  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.2.3)
#>  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.3)
#>  usethis       2.2.2   2023-07-06 [1] CRAN (R 4.2.3)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.3)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.2.3)
#>  withr         2.5.2   2023-10-30 [1] CRAN (R 4.2.3)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.2.3)
#>  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.3)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.3)
#> 
#>  [1] C:/Program Files/R/R-4.2.3/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────