We wrote this book to help new R programming users with limited programming and statistical background. We understand the struggles they are going through to move from point-and-click statistical software such as SPSS or MS Excel to more code-centric software such as R and Python. From our experiences, frustration sets in early in learning this code-centric software. It often demotivates new users to the extent that they ditch them and return to using point-and-click statistical software. This book will minimise these struggles and gently help these excited but fragile new users to learn quickly and effectively the codes and workflows to perform data and statistical analysis using the R programming language.

This book’s audiences include postgraduate students, public health researchers, epidemiologists, and biostatisticians. We designed and wrote this book based on our experiences teaching students in the public health, epidemiology and biostatistics programmes at the School of Medical Sciences, Universiti Sains Malaysia. Between KIM and WNA, we have over 30 years of experience teaching and training undergraduate and postgraduate students. As we mentioned earlier, most of our postgraduate students are students in public health, epidemiology or biostatistics programmes at the School of Medical Sciences, Universiti Sains Malaysia. The courses we teach include basic and advanced statistics, multivariable data analysis, structural equation modelling, advanced numerical data analysis, and advanced categorical data analysis. This book, we believe, will achieve its objective. The main objective is to help new R programming users (such as our undergraduate and postgraduate students) quickly understand the R programming language, make plots, explore data, summarise data, and perform statistical analyses inside RStudio IDE. We also provide the interpretation of graphs, tables and statistical models relevant to our students and us. They do not have strong mathematical and statistical backgrounds; however, in their career, they are very much involved with collecting, analysing, and interpreting data. Some will work at medical and health institutions and organisations. Their applied knowledge and skills in data analysis and epidemiological and statistical models will help them draw health policies and make evident-based public health interventions.

We use the rmarkdown package and the bookdown package to write this book inside RStudio IDE. We are truly grateful to all the people who have developed both packages and to POSIT Software. Posit Software PBC is an Open Source Data Science Company, previously known as RStudio PBC. Posit Software PBC continuously supports open-source data science initiatives and develops and distributes the fantastic RStudio IDE. When writing this physical book, we used R version 4.2.2 and RStudio version 2022.07.2 Build 576.

The source codes for the book are available on our GitHub repository, and the datasets are also available on GitHub. We are indebted to George Knott, the statistics editor at CRC Press, who has been very supportive and understanding when we are chasing the datelines of the book. In addition, CRC Press has been very kind to allow us to have the online version of the book at the bookdown website. Thank you, George and CRC Press.

All in all, we hope you enjoy this book!


Kamarul Imran Musa


Wan Nor Arifin Wan Mansor


Tengku Muhammad Hanis Tengku Mokhtar


School of Medical Sciences,

Universiti Sains Malaysia