2.4 Instructions

This book is intend to work in two ways: one way is to be used as a manual, you can follow along to accomplish an entire data science project; Another way is to be used as a company to my online video recordings. If you can get the video that is great. But if you cannot, it is also fine, The only drawback is you have to read the whole contends line by line.

I will use the following stickers to indicate the text is an explanation or an instruction or actions need you to do. So you know what you have to read word by word and what you can skip.


Code appears with code sticker. Like this,

# Load raw data
train <- read.csv("train.csv", header = TRUE)
test <- read.csv("test.csv", header = TRUE)

They are the one you have to read word by word and type (copy-past) into your script and run them. It is also a good idea to record the results or plot (graph results) into a file. So you can always come back to check them.


Tips, like this one,

Within the console, you can use the up and down arrows to find recent commands, and hitting tab will auto-complete commands and object names if possible.

are general advice. You can skip them if you already know. They can save your time but not affect your learning.


Any actions, by default, are assumed you will act upon. It appears in action sticker,

Change data type Go back to look into Kaggle to explain pclass: proxy for social class: richer or poor. It should be factor, it does not make sense to stay in int, we are not add or calculate with them

data.combined$pclass <- as.factor(data.combined$pclass)

Particularly, they are in a sequential order. If you did not take previous actions you cannot do the current. It is possible you have processed some datasets and it is used later on. So you must carry out actions one by one, and not jump to the later ones without accomplish the earlier ones.


Exercises at the end of each chapter, are provided for you to periodically explore alternatives of a solution or to enhance some key techniques. It is always good if you can do the exercises.

The default protocol is that I have some codes written (at the appendix) and you will download them and open in your RStudio. Then you need to run (or copy and past) line by line into your Rstudio. You can understand their functions and the reason to function like that. After you understand them you can change them or write some new code. While you are doing that, you simply comment out my code rather than delete them just in case you need to come back to look at them again. Once you can write your own code, it shows you have learned.

Before you go, let’s try it,

Open a new project called “MyDataSciece”, Set up working directory as “~/MyDataScienceWithR”, Create a first R program called DSPR1, setwk(~/MyDataScienceWithR)

Okay. Save your file and move to the next chapter.