1 Introduction

Since the beginning of the century, we have been bombarded with amazing advancements and inventions, especially in the field of statistics, information technology, computer science, or a new emerging filed - data science. However, I believe the downside of this introduction is that we use big and trendy words too often (i.e., big data, machine learning, deep learning).

Each substantive field will have a metric subfield:

  • Econometrics in economics

  • Psychometrics in psychology

  • Chemometrics in chemistry

  • Sabermetrics in sports

  • Biostatistics in public health and medicine

But to laymen, these are known as:

  • Data Science

  • Applied Statistics

  • Computational Social Science

It’s all fun and exciting when I learned these new tools. But I have to admit that I hardly retain any of these new ideas. However, writing down from the beginning till the end of a data analysis process is the solution that I came up with. Accordingly, let’s dive right in.

Some general recommendations:

  • The more you practice/habituate/condition, more line of codes that you write, more function that you memorize, I think the more you will like this journey.

  • Readers can follow this book several ways:

    • If you are interested in particular methods/tools, you can jump to that section by clicking the section name.
    • If you want to follow a traditional path of data analysis, read the Linear Regression section.
    • If you want to create your experiment and test your hypothesis, read the Analysis of Variance (ANOVA) section.
  • Alternatively, if you rather see the application of models, and disregard any theory or underlying mechanisms, you can skip to summary and application portion of each section.

  • If you don’t understand a part, search the title of that part of that part on Google, and read more into that subject. This is just a general guide.

  • If you want to customize your code beyond the ones provided in this book, run in the console help(code) or ?code. For example, I want more information on hist function, I’ll type in the console ?hist or help(hist).

  • Another way is that you can search on Google. Different people will use different packages to achieve the same result in R. Accordingly, if you want to create a histogram, search on Google histogram in R, then you should be able to find multiple ways to create histogram in R.

Tools of statistics

  • Probability Theory
  • Mathematical Analysis
  • Computer Science
  • Numerical Analysis
  • Database Management

Code Replication

This book was built with R version 4.2.3 (2023-03-15 ucrt) and the following packages:

