1 Readme. before you get started

1.1 A starting point and reference book

My journey with R began rather suddenly one night. After many months of contemplating learning R and too many excuses not to get started, I put all logic aside and decided to use R on my most significant project to date. Admittedly, at the time, I had some knowledge of other programming languages and felt maybe overconfident learning a new tool. This is certainly not how I would recommend learning R. In hindsight, though, it enabled me to do things that pushed the project much further than I could have imagined. However, I invested a lot of additional time on top of my project responsibilities to learn this programming language. In many ways, R opened the door to research I would not have thought of a year earlier. Today, I completely changed my style of conducting research, collaborating with others, composing my blog posts and writing my research papers.

It is true what everyone told me back then: R programming has a steep learning curve and is challenging. To this date, I would agree with this sentiment if I ignored all the available resources. The one thing I wish I had to help me get started was a book dedicated to analysing data from a Social Scientist perspective and which guides me through the analytical steps I partially knew already. Instead, I spent hours searching on different blogs and YouTube to find solutions to problems in my code. However, at no time I was tempted to revert to my trusted statistics software of choice, i.e. SPSS. I certainly am an enthusiast of learning programming languages, and I do not expect that this is true for everyone. Nevertheless, the field of Social Sciences is advancing rapidly and learning at least one programming language can take you much further than you think.

Thus, the aim of this book is narrowly defined: A springboard into the world of R without having to become a full-fledged R programmer or possess abundant knowledge in other programming languages. This book will guide you through the most common challenges in empirical research in the Social Sciences. Each chapter is dedicated to a common task we have to achieve to answer our research questions. At the end of each chapter, exercises are provided to hone your skills and allow you to revisit key aspects. In addition, the appendix offers several in-depth case studies that showcase how a research project would be carried out from start to finish in R using real datasets.

This book likely caters to your needs irrespective of whether you are a seasoned analyst and want to learn a new tool or have barely any knowledge about data analysis. However, it would be wrong to assume that the book covers everything you possibly could know about R or the analytical techniques covered. There are dedicated resources available to you to dig deeper. Several of these resources are cited in this book or are covered in Chapter 15. The primary focal point of the book is on learning R in the context of Social Sciences research projects. As such, it serves as a starting point on what hopefully becomes an enriching, enjoyable, adventurous, and lasting journey.

1.2 Download the companion R package

The learning approach of this book is twofold: Convey knowledge and offer opportunities to practice. Therefore, more than 50% of the book are dedicated to examples, case studies, exercises and code you can directly use yourself. To facilitate this interactive part, this book is accompanied by a so-called ‘R package’ (see Chapter 5.4), which contains all datasets used in this book and enables you to copy and paste any code snippet1 and work along with the book.

Once you worked through Chapter 3 you can easily download the package r4np using the following code snippet in your console (see Chapter 4.1):

devtools::install_github("ddauber/r4np")

1.3 A ‘tidyverse’ approach with some basic R

As you likely know, every language has its flavours in the form of dialects. This is no different to programming languages. The chosen ‘dialect’ of R in this book is the tidyverse approach. Not only is it a modern way of programming in R, but it is also a more accessible entry point. The code written with tidyverse reads almost like a regular sentence and, therefore, is much easier to read, understand, and remember. Unfortunately, if you want a comprehensive introduction to learning the underlying basic R terminology, you will have to consult other books. While it is worthwhile to learn different ways of conducting research in R, basic R syntax is much harder to learn, and I opted against covering it as an entry point for novice users. After working through this book, you will find exploring some of the R functions from other ‘dialects’ relatively easy, but you likely miss the ease of use from the tidyverse approach.

1.4 Understanding the formatting of this book

The formatting of this book carries special meaning. For example, you will find actual R code in boxes like these.

name <- "Daniel"
food <- "Apfelstrudel"

paste("My name is ", name, ", and I love ", food, ".", sep = "")
#> [1] "My name is Daniel, and I love Apfelstrudel."

You can easily copy this code chunk by using the button in the top-right corner. Of course, you are welcome to write the code from scratch, which I would recommend because it accelerates your learning.

Besides these blocks of code, you sometimes find that certain words are formatted in a particular way. For example, datasets, like imdb_top_250, included in the R package r4np, are highlighted. Every time you find a highlighted word, it refers to one of the following:

  • A dataset,

  • A variable,

  • A data type,

  • The output of code,

  • The name of an R package,

  • The name of a function or one of its components.

This formatting style is consistent with other books and resources on R and, therefore, easy to recognise when consulting other content, such as those covered in Chapter 15.


  1. A code snippet is a piece of programming code that can be used directly as is.↩︎