Chapter 1 Introduction

This guide is an attempt to streamline and demystify the data analysis process.

By no mean this is an ultimate guide, or I am a great source of knowledge, or I claim myself to be a statistician/ econometrician, but I am a strong proponent of learning by teaching, and doing. Hence, this is more like a learning experience for both you and me.

Since the beginning of the century, we have been bombarded with amazing advancements and inventions, especially in the field of statistics, information technology, and computer science. However, I believe the downside of this introduction is that we use big and trendy words too often (i.e., big data, machine learning, deep learning).

It’s all fun and exciting when I learned these new tools. But I have to admit that I hardly retain any of these new inventions.However, writing down from the beginning till the end of a data analysis process is the solution that I came up with. Accordingly, let’s dive right in.

Some general recommendation:

  • The more you practice/habituate/condition, more line of codes that you write, more function that you memorize, I think the more you will like this journey.

  • Readers can follow this book several ways:

    • If you are interested in particular methods/tools, you can jump to that section by clicking the section name.
    • If you want to follow a traditional path of data analysis, read the Linear Regression section.
    • If you want to create your experiment and test your hypothesis, read the Analysis of Variance (ANOVA) section.
  • Alternatively, if you rather see the application of models, and disregard any theory or underlying mechanisms, you can skip to summary and application portion of each section.

  • If you don’t understand a part, search the title of that part of that part on Google, and read more into that subject. This is just a general guide.

  • If you want to customize your code beyond the ones provided in this book, run in the console help(code) or ?code. For example, I want more information on hist function, I’ll type in the console ?hist or help(hist).

  • Another way is that you can search on Google. Different people will use different packages to achieve the same result in R. Accordingly, if you want to create a histogram, search on Google histogram in R, then you should be able to find multiple ways to create histogram in R.

Information in this book are from various sources, but most of the content is based on several courses that I have taken formally. I’d like to give professors credit accordingly.

Course Professor
Data Analysis I Erin M. Schliep
Data Analysis II Christopher Wikle
Applied Econometric Alyssa Carlson

Tools of statistics

  • Probability Theory
  • Mathematical Analysis
  • Computer Science
  • Numerical Analysis
  • Database Management

Setup Working Environment

if (!require("pacman"))
if (!require("devtools"))