Part I - Basic Data Preparation and Visualization

Part I covers the basic data preparation and visualization tools of data science like plotting axis, points, lines, columns, bars, pies, and donuts. But we do these using only one dataset at a time, without combining datasets.

Chapter 1 starts with how to download the separate datasets on the Malaysian public Covid data https://github.com/MoH-Malaysia/covid19-public directly into the R environment. We then show basic data manipulation (wrangling) commands with examples using some selected Malaysia Covid datasets that we have downloaded.

Chapter 2 introduces the main visualization tool. We show simple examples of the basic plotting functions of the ggplot2 package again using one dataset at a time.

Chapter 3 shows detailed step-by-step examples of how to visualize the distribution of data from a single variable or uni-variate graphs. The variable can be categorical (e.g., state or cluster type) or quantitative (e.g., daily cases or deaths).

Chapter 4 shows examples of bivariate graphs that display the relationship between two variables.

Chapter 5 goes to the next level to show multivariable or multivariate graphs that display the relationships among three or more variables through grouping and faceting.

Chapter 6 separately covers time-dependent graphs which are directly relevant to the Covid data that show changes by the day. It also covers moving averages to smooth the time-series data.

Chapter 7 looks at how to customize the appearance of the non-data parts of graphs like axes, legends, colors, and text annotations.

We like the flow of the initial chapters in Data Visualization with R and have followed it in Part I. These chapters are like building blocks for the further chapters in the book. We purposely arranged it from simple one-variable graphs to multivariable graphs. As such, we will develop the examples step-by-step.

By the end of chapters in Part I we will have covered the basic techniques of data visualization.

  • plotting points, lines, bars, columns, pies, and donuts
  • converting a dataset into long format
  • using the dplyr verbs to prepare the data and add columns or variables by combining columns from the same dataset
  • differentiating with colors and symbols
  • customizing titles, subtitles, captions, axis
  • using groups and facets

Date of datasets

Part I is intended to cover the basic techniques and options of data visualization. As we prepared the examples, there were changes in the columns of some of the datasets1. Some new columns were added, and some column names were changed.

Since the purpose is to learn the techniques without emphasizing the analysis of the visuals, we decided to use the datasets downloaded as of “2021-10-01”. We remind the readers to check out the column names and positions when they download the latest version of the datasets. They may have to change the column references in our example codes accordingly.