Part 2: Programming basics

Part 1 laid the foundations for our work with R: We now can distinguish between various data types and shapes, create new data objects (usually as vectors or data frames), access their elements (e.g., by numerical or logical indexing), and manipulate them by using functions.

The next step of our journey involves some basic concepts of computer programming. In our introduction, we quoted Frank Harrell (Section 1.2.3.1):

Can one be a good data analyst without being a half-good programmer?
The short answer to that is, ‘No.’
The long answer to that is, ‘No.’

Frank Harrell (1999), S-PLUS User Conference, New Orleans

and remarked that the notion of a “half-good programmer” remains somewhat vague. Essentially, this part provides an essential programming curriculum for new data analysts. This curriculum contains three chapters:

  • Chapter 4 discusses conditionals for verifying data and distinguishing between cases. After introducing basic and advanced conditionals, we will see that we were actually using a variant of conditionals in Chapters 2 and  3.

  • Chapter 5 enables us to create our own functions. By providing a powerful tool for abstraction and modularization, writing functions will advance our programming skills by a crucial step.

  • Chapter 6 introduces iteration for executing parts of code repeatedly. In R, iteration can be explicit or implicit. Explicit iteration uses loops that use its for, while, or repeat structures. Implicit iteration uses vectorized functions or the families of base R apply() or purrr map() functions to directly apply functions to data structures.

Although a basic familiarity with these concepts and contents will not make us expert programmers, they will strengthen the foundations on which we can build more sophisticated programming skills later.