R is a programming language, but also a software environment. To start learning R, you first need to install some software programs on your computer. The names and purposes of these programs may initially seem a bit confusing, but do not allow this to discourage you. Once all is in place, you will simply open a program and start using R to write scripts, text documents, and visualizations.
After working through this chapter, you should be able to:
- explain why R is or is not like a Swiss knife;
- categorize R objects into data vs. functions;
- distinguish between different shapes (e.g., scalars, vectors, rectangles) and types (e.g., numeric, character, logical) of data;
- create and change R objects (by assignment);
- apply arithmetic functions to numeric objects;
- create and modify vectors and rectangular tables of data;
- select elements from vectors and rectangular tables of data (by indexing);
- recognize and interpret basic if-then statements and for-loops.
This chapter assumes the following:
Software: You have installed the software prerequisites Specifically,
In R, packages can be installed by evaluating the function
install.packages() with the name of the desired package enclosed in quotation marks:6
Readings: You have read the following chapters of r4ds:
This implies that you understand and can do the following:
Enter and run R commands at the prompt in the Console window of R Studio, and check their results;
Use R as a calculator for simple arithmetic;
Assign numeric values and characters to named objects;
Call simple R functions on objects;
Enter and run R scripts in the Editor window of R Studio.
Working through this book required several software products (listed in the introductory chapter and above). To understand the need for installing and loading multiple components, the following distinctions are important:
R core vs. contributed packages: A working installation of R consists of several modules of code, typically called packages. About 30 of them belong to a set of core packages that provide essential R functionality and thus come with every R installation. We will sometimes refer to these packages as base R, even though the package base is only 1 of these core packages. By contrast, the Comprehensive R Archive Network (aka. CRAN) is an online catalogue and global distribution platform for over 15.000 packages. And as the official guidelines for writing R extensions can be scary and intimidating, many R authors choose not to submit their packages to CRAN and instead provide their packages as archives on their own websites. This hierarchy of packages implies different levels of generality and quality: Whereas the set of core packages are written and checked by experts on the R development core team, the vast majority of existing packages have been contributed by committed R users. Consequently, R is like a Swiss knife that consists of a set of basic tools, but thousands of more specialized tools that can be downloaded and tried out, if you happen to work on a corresponding task. But beware: Just as you do not trust every article on someone’s website, you should not blindly trust any R package. In this respect, R is similar to Wikipedia in that both are the result of a collaborative effort of many volunteers that is administered by a team of highly dedicated individuals. And although both products come for free, without any a priori guarantees and could potentially be abused and undermined by evil interests, there are mechanisms to recognize and reward quality over time.
Installing vs. loading R packages: When starting R on your machine, a small set of core packages — typically involving base, datasets, graphics, but also methods and stats — are loaded by default.7 However, the majority of R packages that we will use need to be installed additionally on your computer (once, typically via the
install.packages()command) and loaded (every time) before they can be used. The need to load packages before using them is the reason why many R programs begin with a
library(pkg)command to load a package named pkg.8 More specifically, when some package named pkg defines a command
fancy_fun(), we can only use
fancy_fun()in our code after installing and loading pkg. Alternatively, we can install pkg and then use the command
pkg::fancy_fun(), which essentially instructs R to look for the
fancy_fun()command in package pkg. Again, the Swiss knife analogy is helpful: In order to use some specific tool, we first need to have it available (or installed) on our knife. But to actually use a tool, we still need to open (or load) it first.9
R vs. graphical user interfaces (GUIs): By default, R is an interpreted language that assumes that commands are entered at a prompt (typically shown as
>) and then evaluated by the underlying program. Over time, this basic way of interaction have been supported by graphical user interfaces (GUIs) that provide tools and separate windows for editing programs, displaying outputs, showing system information and libraries, etc. On most platforms, R comes with some GUI pre-installed, but the most versatile platforms to interact with R are so-called integrated development environments (IDEs) that need to installed separately. Here, we will use the currenly most popular and powerful IDE provided by R Studio (in its free, open source Desktop edition).10
Explain in which respects R is or is not like a Swiss knife.
What would be a fitting analogy for a GUI or IDE?
Answer: If packages are viewed as tools for specific tasks, the obvious candidate would be a toolbox. However, as individual R functions can also be viewed as tools (see Section 1.2.2), every package is a toolbox in itself. Thus, this line of thinking leads to an elaborate system of Matryoshka dolls: Swiss-knife-like tools in toolboxes, that are contained in more elaborate toolboxes. Before we get too dizzy, we should remind ourselves that computers essentially are universal machines with many layers of systems, each of which can be described in terms of the tool vs. toolbox analogy.
- Find out how you can view the packages currently installed in your R library and the packages currently loaded when you start R.
Hint: There are R commands for this, but your GUI/IDE also provides access to and information about packages.
- There are multiple R packages that define a
?filter()to find at least 2 corresponding packages. How could you call the corresponding
?filter # shows that dplyr and stats define this command # Calling commands from installed packages: dplyr::filter() # would call the filter() command of dplyr stats::filter() # would call the filter() command of stats
1.1.4 Getting ready
We start our first session by creating an R script (
.R) and loading the R packages of the tidyverse and ds4psy. To facilitate finding information in a script, always structure it by inserting explicit headings, plenty of space between different parts, and meaningful comments (i.e., lines preceded by the
# symbol). A neat feature of the editor in R Studio are the foldable sections that automatically appear when a commented line contains 4 or more consecutive dashes (i.e.,
# ----) and allow closing and opening the corresponding section (by clicking on the small triangle on the left or using the
Cmd + Alt + o and
Cmd + Alt + Shift-O keyboard shortcuts).
Here’s an example how your initial script could look like:
## R basics | ds4psy ## Your Name | 2020 February 10 ## ---------------------------- ## Preparations: ---------- library(tidyverse) library(ds4psy) ## Topic: ---------- # ... ## End of file (eof). ----------
Save your script (e.g., as
01_basics.R in the R folder of your project) and remember saving it regularly as you keep adding content to it.
Installing a package assumes an existing internet connection. More specifically, your system is downloading packages from a client of The Comprehensive R Archive Network (aka. CRAN), which is a nifty way of making over 15.000 packages available world-wide.↩
getOption("defaultPackages")in your R console shows which packages belong to this exclusive set.↩
library()without the name of a specific package prints the location of your package library and a list of all packages currently installed in it.↩
The analogy breaks down when it comes to using multiple tools at once: In R, we typically load many packages in parallel. On a Swiss knife, this would be difficult or dangerous.↩