Part 1: Foundations

Creating a new building first requires solid foundations. In learning, acquiring new knowledge and skills in a step-by-step fashion generally promotes understanding. The guiding principle of this book is that we aim for transparency: Explaining not just what to do, but also why and how to do it. Transparency in data science involves conceptual clarity, a clear structure of contents, and a range of tools to solve typical problems. Implementing this principle means that we supplement the practical parts (on solving tasks with R and particular R packages) with context information and theoretical reflections. But mere definitions and conceptual distinctions tend to be abstract and risk boring students. Hence, we aim to motivate topics by putting them into context, point out key implications, and allow to check our understanding by providing engaging exercises.

This initial part contains three chapters:

  • Chapter 1 motivates the what and why of data science. Based on the need for data literacy, we consider the concepts of data and science, and discuss the mix of skills that aspiring data scientists should have. Further reflections on the notion of ecological rationality and transparency will steer us towards designing more intuitive tools. As this chapter contains some reflective musings and excessive ramblings, it can be skimmed or skipped. However, please make sure that you are not missing out on R Markdown, the only technological tool introduced in this chapter. R Markdown is a document-editing and -creation framework that enables reproducible research by rendering the scientific process more transparent.

  • Chapter 2 introduces basic R data types and shapes. In R, clarity initially requires understanding the difference between data and functions, as well as different types and shapes of data. Whereas we will distinguish between the types of logical, numeric, and character data, the only data shape that we will encounter in this chapter are atomic vectors.

  • Chapter 3 introduces R data structures. Once we grasp the different types and shapes of data, we can combine them into data structures. Two basic data structures in R are vectors (for homogeneous data) and lists (for heterogeneous data). The key data structure used throughout this book comes in the shape of a rectangular table that stores data in rows of observations and columns of variables. In R, such tables are known as data frames (or tibbles) and implemented as a list of vectors.

Taken together, this part lays the foundations on which we can build basic programming and visualization skills (in Parts 2 and 3 of this book).