Chapter 1 The first steps in R

1.1 RStudio for the first time

When you open RStudio, it is typically split in 4 panels as you can see in the figure below:

title Figure 1: Illustration of RStudio

These panels show:

  1. Your script file, which is an ordinary text file with suffix “.R”. For instance, yourFavoritFileName.R. This file contains your code.
  2. All objects we have defined in the environment.
  3. Help files and plots.
  4. The console, which allows you to type commands and display results.

Some other terms that will be useful:

  • Working directory: The file-directory you are working in. Useful commands: with getwd() you get the location of your current working directory and setwd() allows you to set a new location for it. This is very useful when you would like to load or save several files. You can just set your working directory globally and all files can be loaded or will be saved into this directory.
  • Workspace: This is a hidden file (stored in the working directory), where all objects you use (e.g., data, matrices, vectors, variables, functions, etc.) are stored. Useful commands: ls() shows all elements in our current workspace and rm(list=ls()) deletes all elements in our current workspace. It is also possible to remove only some objects with rm(object1, object2).

1.2 Simple Calculations

We start with a very basic calculation. Let’s type 1+1 into the console window and press enter. We get

 1+1 # first calculation
## [1] 2

i.e. the result of the calculation is returned.

Note that everything that is written after the #-sign is ignored by R, which is very useful to comment your code. The second window above, starting with ##, shows the output.

Let’s consider a second calculation

1+2+3+ # second calculation
## Error: <text>:2:0: unexpected end of input
## 1: 1+2+3+ # second calculation
##    ^

This calculation ended with an error. The reasons is that the command can only processed if entered correctly.

1+2+3+4 # second calculation
## [1] 10

A similar thing happens when parentheses are not closed.

 2*(2+3
## Error: <text>:2:0: unexpected end of input
## 1:  2*(2+3
##    ^

It will be helpful to use a script file such as yourFavoritFileName.R to store your R commands. Otherwise, you would have to type your code again when an error occurred. You can send single lines or marked regions of your R-code to the console by pressing the keys STRG+ENTER.

1.3 Assigments

The assignment operator will be your most often used tool. Here I state an example where a scalar variable is created:

x <- 9
x
## [1] 9
x
## [1] 9
x+1
## [1] 10

Note: The R community loves the <- assignment operator, which is a very unusual syntax. Alternatively, you can use the = operator:

x = 9
x
## [1] 9
x
## [1] 9
4 -> y # possible but unusal

We consider now a vector, an object you will use frequently

z = c(1,3,5,6)
z
## [1] 1 3 5 6

As discussed, ls() states the content of the workspace, whereas rm(list =ls()) deletes the workspace.

ls()
## [1] "x" "y" "z"
rm(list =ls())
ls()
## character(0)

In the next step we will consider some vector mutliplication. There are three different ways to multiply a vector, namely element by element, using the inner product, or using the outer product. Element by element gives you a vector of the same dimension.

z = c(1,3,5,6)
z*z                #multiplication element by element
## [1]  1  9 25 36

The function t() gives the transpose of a vector (or matrix). Therefore, the inner product of the vector is given by

z = c(1,3,5,6)
t(z)%*%z                #multiplication inner product
##      [,1]
## [1,]   71
class(t(z)%*%z)
## [1] "matrix"
class(z)
## [1] "numeric"

Note that R stores zTz as matrix.

Finally, we have the outer product that gives us a matrix

z = c(1,3,5,6)
z%*%t(z)                #multiplication outer product
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    6
## [2,]    3    9   15   18
## [3,]    5   15   25   30
## [4,]    6   18   30   36

Be very careful with %*% versus *. They don’t lead to the same result!! Using the transpose explicitly might help to avoid mistakes.

You can also multiply a vector with a scalar:

z*4
## [1]  4 12 20 24

A strength of R is that operations can be conducted for each element of the vector in one statement

z^2
## [1]  1  9 25 36
log(z)
## [1] 0.000000 1.098612 1.609438 1.791759
(z-mean(z))/sd(z)
## [1] -1.2402159 -0.3382407  0.5637345  1.0147221

So far, we have seen c() that can be used to combine objects. All function calls use the same general notation: a function name is always followed by round parentheses. Sometimes, the parentheses include arguments, such as

z <- seq(from = 1, to = 5, by = 1)

z
## [1] 1 2 3 4 5
z[1:3]                  #refer to particular elements
## [1] 1 2 3

We consider now a matrix

M <-matrix(data=1:9, nrow=3, ncol=3)    # define matrix M
M
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
dim(M)                                  # dimension of M
## [1] 3 3
dim(z)                                  # dimension of z  
## NULL
length(z)
## [1] 5

Element by element multiplication for M is given by

M * M
##      [,1] [,2] [,3]
## [1,]    1   16   49
## [2,]    4   25   64
## [3,]    9   36   81

In addition

M %*% M
##      [,1] [,2] [,3]
## [1,]   30   66  102
## [2,]   36   81  126
## [3,]   42   96  150
t(M) %*% M
##      [,1] [,2] [,3]
## [1,]   14   32   50
## [2,]   32   77  122
## [3,]   50  122  194
M %*% z[1:3]
##      [,1]
## [1,]   30
## [2,]   36
## [3,]   42

is the standard matrix multiplication. We can access the elements of a matrix with

M[,3]                 # state the third column of M
## [1] 7 8 9
M[3,3]                # state element in the third column and third row of M
## [1] 9

1.4 Further Data Objects

Besides classical data objects such as scalars, vectors, and matrices there are three further data objects in R:

1.The array: A matrix but with more dimensions. Here is an example of a 2×2×2-dimensional array

myFirst.Array <- array(c(1:8), dim=c(2,2,2))
  1. The list: In lists you can organize different kinds of data. E.g., consider the following example:
myFirst.List <- list("Some_Numbers" = c(66, 76, 55, 12, 4, 66, 8, 99), 
                     "Animals"      = c("Rabbit", "Cat", "Elefant"),
                     "My_Series"    = c(30:1)) 
  1. The data frame: A data.frame is a list-object but with some more formal restrictions (e.g., equal number of rows for all columns). As indicated by its name, a data.frame-object is designed to store data:
myFirst.Dataframe <- data.frame("Credit_Default"   = c( 0, 0, 1, 0, 1, 1), 
                                "Age"              = c(35,41,55,36,44,26), 
                                "Loan_in_1000_EUR" = c(55,65,23,12,98,76)) 

1.5 R Help

There exist help files for all commands in R. For example consider the help file of the function sum(). You can access the help file with

?sum

The file usually contains the categories: Description, Usage, Arguments, Details, Value and References.

After the general description follows the syntax of the command. This usually indicates which (optional) arguments the command allows. Then follows na.rm., which indicates whether missing values (na) will be removed (rm). The default is FALSE.

 x<-c(1,2,NA,4,5)
 x
## [1]  1  2 NA  4  5
sum(x)
## [1] NA
sum(x,na.rm=T)
## [1] 12

Details describes the exact calculation (trivial in this case), and some command specific characteristics. Value describes what is returned, in the case of sum() it is obviously a sum of numbers, given that numeric data was used as an input.

1.6 R-packages

There are tons of R-packages that are useful for different aspects of data analysis. Some packages that can help you make beautiful plots and that you might want check out are the package ggplot2 and the collection of packages tidyverse. You can install the packages with install.package(packagename). In order to use the package type library(package). The package has to be installed only once, but you have to load the package with library(package) every time you want to use it.