3 Speaking R

This chapter introduces the R environment.

3.1 Working directory

When using R, the first step suggested is to set a working directory. A working directory is the default location for working with the ongoing tasks, including reading and writing data files, opening and saving scripts, and saving the workspace image. It is a folder that we visit for a problem we are working on.

Use the command setwd() to set the working directory. If we are in RStudio, we can also use the drop-down menu Session–>Set Working Directory–>Choose Directory… to conveniently change the working directory.

Use the command getwd() to get the current working directory.

It is recommended that we use separate working directories for different projects.

If we open a file and do not specify an absolute path, R will assume that the file is in our working directory.

3.2 R command

Now if we move to the console, the R program issues a prompt >, which is waiting for our input commands.

using R interactively

To get started, let’s just treat R as a calculator. When we enter an expression at the command prompt, R will evaluate the expression, print the result, or respond with an error message.

1 + 1

## [1] 2

mean(c(1, 2, 3, 4, 5))

## [1] 3

sqrt(4) #square root

## [1] 2

abs(-1) #absolute value

## [1] 1

cos(c(0, pi/4, pi/2, pi)) #pi is a built-in constant

## [1]  1.000000e+00  7.071068e-01  6.123234e-17 -1.000000e+00

print

Unlike many other programming languages, we can output code in R without using a print() function explicitly.

1 + 1

## [1] 2

print(1 + 1)

## [1] 2

Implicitly, when we enter an expression, R evaluates the expression and calls the print() function.

We can use the print() function for generic printing of any object.

print(matrix(1:12, nrow = 3, ncol = 4))

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

built-in constants

R has a small number of built-in constants pi, LETTERS, letters, month.abb, and month.name.

pi is the ratio of the circumference of a circle to its diameter.

pi

## [1] 3.141593

LETTERS is the 26 upper-case letters of the Roman alphabet.

LETTERS

##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
## [21] "U" "V" "W" "X" "Y" "Z"

letters is the 26 lower-case letters of the Roman alphabet.

letters

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
## [21] "u" "v" "w" "x" "y" "z"

month.abb is the three-letter abbreviations for the English month names.

month.abb

##  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

month.name is the English names for the months of the year.

month.name

##  [1] "January"   "February"  "March"     "April"     "May"       "June"     
##  [7] "July"      "August"    "September" "October"   "November"  "December"

What is “[1]” that accompanies each returned value?

It means that the index of the first item displayed in the row is 1.

In R, any number that we enter in the console is interpreted as a vector. A vector is an ordered collection of numbers. We’ll see what vector is in the next chapter.

assignment

Like most other languages, R lets us assign values to variables and refer to them by name.

The assignment operator is <-.

An assignment evaluates an expression, and passes the value to a variable. But the result is not automatically printed. The value is stored in the variable that we have defined. That would be a in the example below.

a <- 1 + 1

To print the variable value, we simply type the variable name (a in this case).

## [1] 2

Left-to-right assignment also works, but it is unconventional and not recommended.

1 + 1 -> a
a

## [1] 2

A single equal sign = can also be used as an assignment operator (also not recommended). In other programming languages, it is common to use = as an assignment operator. But in R the = operator can be forbidden in some situations. In general, <- is preferred. There is an in-depth discussion on the differences between = and <- assignment operators on Stack Overflow.

a = 1 + 1

If the object already exists, its previous value is overwritten.

## [1] 2

a <- 2 + 2
a

## [1] 4

variable names

A variable name must start with . or a letter; if it starts with a . the second character must not be a digit. A name cannot start with a number or underscore _.

Variable names are case-sensitive. For instance, age, Age and AGE are three different variables.

Reserved words cannot be used as variables. These include TRUE, FALSE, NULL, NA, if, else, while, function, for, next, break, repeat, and a few others. Use ?Reserved to learn more about the Reserved Words in R.

Names are unlimited in length.

Whichever naming convention you choose, make sure that you keep to one convention.

myvar <- "Jane Doe"
my_var <- "Jane Doe"
myVar <- "Jane Doe"
MYVAR <- "Jane Doe"

R command is case sensitive

Variables A and a are different.

Functions nrow() and NROW() are different.

Commands are separated either by a semi-colon `;` or by a newline

Semi-colon:

a <- 1 + 1 ; b <- 2 + 2

Newline:

a <- 1 + 1
b <- 2 + 2

incomplete commands

If a command is not complete at the end of a line, R will give a prompt + on second and subsequent lines and continue to read input until the command is complete.

Use esc to escape incomplete command.

auto-completion

R includes automatic completions for object names.

Type something in your console, and use the tab key to see the list of possible completions for the object you are trying to create.

recalling previous commands; command-line editing

We can recall, correct, and reexecute our commands in R easily.

By pressing the up arrow or the down arrow, we will be able to scroll through previous commands.

By using the left and right arrow keys to move within the command, we can tweak a previous command to repeat it or correct a mistake.

`source()`

After finish working on a problem, we want to keep a record of every step that we have taken.

Those commands can be stored in an external file (e.g., project1.R) in the working directory. Later we will be able to use the source() function to read and execute the code without having to retype it.

source("project1.R")

If we are working on a project, we can break our long script down into separate scripts for each task, and then read and execute the code from individual scripts using source().

source("cleaning.R")
source("models.R")
source("graphics.R")

3.3 Comment

Comment starts with a hashmark #.

When executing the R commands, R will ignore anything that starts with #.

Comments can be used to explain R code, make code more readable, or prevent execution when testing alternative commands.

# 1 + 1
1 #+2

## [1] 1

c(1:2) # this is a vector

## [1] 1 2

To create multiline comments, we need to insert a # for each line.

# This is a comment
# written in
# more than just one line

3.4 Object

The entities that R creates and manipulates are known as objects. Everything in R is an object. These may be numeric vectors, character strings, lists, functions, etc. For now, let’s just think about an object as a “thing” that is represented by the computer.

workspace

The collection of objects currently stored in memory is called the workspace.

The function ls() displays the names of the objects in our workspace.

ls()

##   [1] "a"                      "AAPL"                   "admitted"              
##   [4] "admitted2"              "AFL"                    "age"                   
##   [7] "applicants"             "applicants2"            "area"                  
##  [10] "area2"                  "b"                      "base_stocks"           
##  [13] "BC"                     "book"                   "books"                 
##  [16] "c"                      "c0"                     "c1"                    
##  [19] "c2"                     "c3"                     "country"               
##  [22] "d"                      "date"                   "date_string"           
##  [25] "demo"                   "deposit"                "exclude_trades"        
##  [28] "f"                      "file1"                  "file2"                 
##  [31] "file3"                  "file4"                  "file5"                 
##  [34] "file5_2"                "file6"                  "file7"                 
##  [37] "flavor"                 "flavor_f"               "founded"               
##  [40] "Founded1"               "Founded2"               "Founded3"              
##  [43] "fun"                    "function_letters"       "g"                     
##  [46] "gcd"                    "genre"                  "get_divisor"           
##  [49] "get_intersect"          "get_score"              "i"                     
##  [52] "incl_nt"                "laureate"               "lcm"                   
##  [55] "len"                    "m1"                     "m2"                    
##  [58] "m3"                     "MMM"                    "MMM_SMA"               
##  [61] "MMM_SMA_EMA"            "my_function"            "myEMA"                 
##  [64] "mylist"                 "myMom"                  "mySMA"                 
##  [67] "n"                      "neu_set"                "no"                    
##  [70] "nobel_prize_literature" "now"                    "nth"                   
##  [73] "num"                    "output"                 "output_new"            
##  [76] "output2"                "pop"                    "pop_den"               
##  [79] "pop2"                   "qm"                     "result1"               
##  [82] "result2"                "sale_cond"              "sample"                
##  [85] "score_sum"              "sent_compound"          "sent_tokens"           
##  [88] "sp500"                  "sp500stocks"            "sp500tickers"          
##  [91] "starwars"               "state.df"               "state.list"            
##  [94] "stocks_br"              "string"                 "string1"               
##  [97] "string2"                "survey_results"         "t"                     
## [100] "tidy_stocks"            "time"                   "tq"                    
## [103] "tq_to_wide"             "tri"                    "tweets"                
## [106] "u"                      "v"                      "v1"                    
## [109] "v2"                     "v3"                     "values"                
## [112] "vec"                    "vec1"                   "vec2"                  
## [115] "w"                      "x"                      "y"                     
## [118] "year"                   "year1"                  "year2"                 
## [121] "year3"                  "years"                  "years_1"               
## [124] "years_2"                "yes"                    "z"

To remove objects from our workspace, we use the function rm(). There is no “undo”; once the variable is gone, it’s gone.

rm(a, b)

We can remove all the objects in memory. This erases our entire workspace at once.

rm(list = ls())

Now, our workspace is empty. ls() returns an empty vector.

ls()

## character(0)

We can save all the current objects with save.image(). The workspace will be written to a .Rdata file in the current working directory. We will be able to reload the workspace from this file when R is started at later time from the same directory.

load("week1.Rdata")

`q()` terminates an R session

3.5 Function

Every operation in R is a function. Essentially, when we enter an expression into the console, R parses the expression, translates the expression into a functional form, and evaluates that function.

A function is an object that takes some input objects (arguments) and returns an output object. Most functions are in the form f(argument1, argument2, ...). We’ve already seen a few functions (e.g., print(), mean()).

We’ll discuss functions in detail in the chapter Functions and learn how to write our own functions.

3.6 Package

Packages are the primary extension mechanism for R.

A package is a related set of functions, documentation, and data that have been packed together. It is the fundamental unit of shareable code with others in R.

The design philosophy behind R is to build smaller, specialized tools that each does one thing well, instead of large programs that do everything. For instance, we can have packages for drawing graphics, packages for performing statistical tests, and packages for the latest machine learning techniques, etc.

When we download R, we get the base packages, or the standard packages. These packages contain the basic functions that allow R to work, and are automatically available to us.

We can download and install many more packages from package repositories (usually CRAN) that are specialized in certain things (e.g., statistical methods) or designed for a purpose (e.g., textbook companion).

We can also build our own packages if we want to share code or data with other people, or if we want to pack them up in a form that’s easy to reuse.

In the first half of the bootcamp, we will be focusing on the base R packages to understand how the language works. After we transition to the more practical side of the world, we will turn to add-on packages for tasks like data manipulation, visualization, and collecting web data.

installing packages

Users can install packages from multiple places, including CRAN, GitHub, BitBucket, Bioconductor (genomics), and rForge.

To install packages from CRAN repositories, we use the function install.packages(). Make sure we are connected to the Internet. Put the package name of the package in quotes.

install.packages("tidyverse")

GitHub is where much of the open-source development of R packages takes place. From GitHub, we can install development versions of packages that have a stable version on CRAN as well as packages not submitted to CRAN yet.

devtools::install_github("tidyverse/ggplot2")
remotes::install_github("tidyverse/dplyr")

We can install different versions of packages.

For instance, if we need to install an older version of a package so that it works in an earlier version of R, we can download the package from its archive.

path_to_file <- "https://cran.r-project.org/src/contrib/Archive/nanotime/nanotime_0.3.2.tar.gz"
install.packages(path_to_file, repos = NULL, type = "source")

updating packages

To view the installed packages, use the function library() with no arguments. It prints a list of installed packages in a new window.

To update the installed packages, use update.packages().

loading packages

Make sure that a package is loaded into memory before using its functionalities. Only when a package is loaded are its contents available.

We use the function library() to load the package. Libraries are directories containing installed packages. As end users of R, we typically interact with installed packages that live in libraries.

library(tidyverse)

Use the function search() with no arguments to see the list of packages currently loaded into R.

Use the function detach() to unload a package that is currently loaded.

detach(package:tidyverse)

3.7 Getting help

help() is the primary interface to R’s help systems. It displays the documentation for a function. ? is the shortcut for help().

help(mean)
?mean

For a feature specified by special characters, the argument must be enclosed in double or single quotes.

help("if")
help("function")
help("[[")

example() runs an Examples section from the online help.

example(median)

## 
## median> median(1:4)                # = 2.5 [even number]
## [1] 2.5
## 
## median> median(c(1:3, 100, 1000))  # = 3 [odd, robust]
## [1] 3

help.search() searches all the installed packages to find help pages on a vague topic.

help.search("state space")

In addition to the official documentations (e.g., reference manuals, vignettes), there are several other frequently used methods to get help.

Stack Overflow

For troubleshooting, a good place to ask questions is the forum Stack Overflow, which is a searchable Q&A site oriented toward programming issues.

CRAN Task Views

If we simply have a generic interest on a topic, CRAN Task Views provides some guidance on which R packages on CRAN are relevant for tasks related to a certain topic. The page gives a brief overview of the included packages and links to the packages.

Just to give an idea of the level and range of the topics, the tasks include Bayesian Inference, Causal Inference, Databases with R, Empirical Finance, Natural Language Processing, Web Technologies and Services, and many more.

search engines and chat bots

Google has been our best friend to help us find solutions to specific tasks, such as debugging or suggestions of how to write a program.

Now we have several new friends. Chat bots or reasoning models (for complex tasks) from OpenAI, Claude, or DeepSeek can all be useful coding assistants. For example, we can use them to explain a code snippet to us, translate code between programming languages, or generate sample code as a starting point for further development. These tools can be particularly helpful if we are not familiar with a certain language but need to understand it.