R is a programming language, but also a software environment (R Core Team, 2020). To start learning R, you first need to install some software programs on your computer. The names and purposes of these programs may initially seem a bit confusing, but do not allow this to discourage you. Once all is in place, you will simply open a program and start using R to write scripts, text documents, and visualizations.
After working through this chapter, you should be able to:
- explain why R is and is not like a Swiss knife;
- categorize R objects into data vs. functions;
- distinguish between different shapes (e.g., scalars, vectors, rectangles) and types (e.g., numeric, character, logical) of data;
- create and change R objects (by assignment);
- apply arithmetic functions to numeric objects;
- create and modify vectors and rectangular tables of data;
- select elements from vectors and rectangular tables of data (by indexing);
- recognize some more advanced issues (e.g., factors, lists, random sampling, conditionals, and loops).
This chapter assumes the following:
Software: You have installed the software prerequisites specified in the Introduction. Specifically,
Once R and RStudio installed and are running, additional R packages can be installed by evaluating the function
install.packages() in the Console of your R interface, with the name of the desired package enclosed in quotation marks:10
Packages only need to be installed once (unless you want to install an updated version), but need to be loaded every time they are being used. Loading the two packages just installed can be achieve by the following commands:
The terminology of R packages is explained in more detail in Section 1.1.3 below.
Other introductions to R may require slightly different packages. For instance, see the software requirements of Chapter 1.4 of r4ds and tidyverse.org for current information on the tidyverse packages.
Readings: You have read the introductory chapters of r4ds and are familiar with the setup and terminology of R and RStudio:
This implies that you understand and can do the following:
Enter and run R commands at the prompt in the Console window of RStudio, and check their results;
Use R as a calculator for simple arithmetic;
Assign numeric values and characters to named objects;
Call simple R functions on objects;
Enter and run R scripts in the Editor window of RStudio;
Collect and store all course-related files in a dedicated directory and correspponding RStudio project.
Working through this book required several software products (listed in the introductory chapter and above). To understand the need for installing and loading multiple components, the following distinctions are important:
R core vs. contributed packages: A working installation of R can be thought of your R engine and consists of several modules of code, typically called packages. About 30 of them belong to a set of core packages that provide essential R functionality and thus come with every R installation. We will sometimes refer to these packages as base R, even though the package base is only one of these core packages. By contrast, the Comprehensive R Archive Network (aka. CRAN) is an online catalogue and global distribution platform for over 16.000 additional packages, which can be thought of as providing more specialized tools that are collected in a distributed archive. And as the official guidelines for writing R extensions can be scary and intimidating, many R authors choose not to submit their packages to CRAN and instead provide their packages as archives in other places. The hierarchy of R packages implies different levels of generality and quality: Whereas the set of core packages are written and checked by experts on the R development core team, the vast majority of existing packages have been contributed by committed R developers and users. Consequently, R is like a Swiss knife insofar as it consists of a set of basic tools, plus thousands of more specialized tools that can be added when you happen to work on a corresponding task. But beware: Just as you do not trust every article on someone’s website, you should not blindly trust any R package. In this respect, R is more similar to Wikipedia — both are the result of a collaborative effort of many volunteers that is administered by a team of highly dedicated experts. And although both products come for free, without any a priori guarantees and could potentially be abused and undermined by evil interests, there are mechanisms to recognize and reward quality over time.
Installing vs. loading R packages: When starting R on your machine, a small set of core packages — typically involving base, datasets, graphics, but also methods and stats — are loaded by default.11 However, the majority of R packages that we will use need to be installed additionally on your computer (once, typically via the
install.packages()command) and loaded (every time) before they can be used. The need to load packages before using them is the reason why many R programs begin with a
library(pkg)command to load a package named pkg.12 More specifically, when some package named pkg defines a command
fancy_fun(), we can only use
fancy_fun()in our code after installing and loading pkg. Alternatively, we can install pkg and then use the command
pkg::fancy_fun(), which essentially instructs R to look for the
fancy_fun()command in package pkg. Again, the Swiss knife analogy is helpful: In order to use some specific tool, we first need to have it available (or installed) on our knife. But to actually use a tool, we still need to open (or load) it first.13
R vs. graphical user interfaces (GUIs): By default, R is an interpreted language that assumes that commands are entered at a prompt (typically shown as
>) and then evaluated by the underlying program. Over time, this basic way of interaction have been supported by graphical user interfaces (GUIs) that provide tools and separate windows for editing programs, displaying outputs, showing system information and libraries, etc. On most platforms, R comes with some GUI pre-installed, but the most versatile platforms to interact with R are so-called integrated development environments (IDEs) that need to installed separately. Here, we will use the currenly most popular and powerful IDE provided by RStudio (in its free, open source Desktop edition).14
Explain in which respects R is and is not like a Swiss knife.
What would be a fitting analogy for a GUI or IDE?
Answer: If packages are viewed as tools for specific tasks, the obvious candidate would be a toolbox. However, as individual R functions can also be viewed as tools (see Section 1.2.2), every package is a toolbox in itself. Thus, this line of thinking leads to an elaborate system of Matryoshka dolls: Swiss-knife-like tools in toolboxes, that are contained in more elaborate toolboxes. Before we get too dizzy, we should remind ourselves that computers essentially are universal machines with many layers of inter-related systems, each of which can be described in terms of the tool vs. toolbox analogy. Thus, all such descriptions are somewhat arbitrary and crucially depend on our current perspective and interests.
- Find out how you can view the packages currently installed in your R library and the packages that are being pre-loaded when you start R.
Hint: There are R commands for this, but your GUI/IDE also provides access to and information about packages.
- There are multiple R packages that define a
?filter()to find at least two corresponding packages. How could you call the corresponding
Note: When loading a package, any conflicts with pre-loaded objects are displayed in the Console (as “masked” objects). For instance, when starting R and then only loading the dplyr package, we see that it re-defines several objects from R’s base and stats packages.
1.1.4 Getting ready
We start our first session by creating an R script (with the file extension
.R) and loading the R packages of the tidyverse and ds4psy.
To facilitate finding information in a script, always structure it by inserting explicit headings, plenty of space between different parts, and meaningful comments (i.e., lines preceded by the
A neat feature of the editor in RStudio are the foldable sections that automatically appear when a commented line contains 4 or more consecutive dashes (i.e.,
# ----) and allow closing and opening the corresponding section (by clicking on the small triangle on the left or using the
Cmd + Alt + o and
Cmd + Alt + Shift-O keyboard shortcuts).
Here’s an example how your initial R script could look like:
Save your R script (e.g., as
01_basics.R in the R folder of your project) and remember saving it regularly as you keep adding content to it.
R Core Team. (2020). R base: A language and environment for statistical computing. Retrieved from https://www.R-project.org
Installing a package assumes an existing internet connection. More specifically, your system is downloading packages from a client of The Comprehensive R Archive Network (aka. CRAN), which is a nifty way of making over 16.000 packages available world-wide.↩
getOption("defaultPackages")in your R console shows which packages belong to this exclusive set.↩
library()without the name of a specific package prints the location of your package library and a list of all packages currently installed in it.↩
The analogy breaks down when it comes to using multiple tools at once: In R, we typically load many packages in parallel. On a Swiss knife, this would be difficult or dangerous.↩