References

# Appendix{#s_99Appendix}

7.1 Introduction to R

For everyone that is more interested in all the topics I strongly recommend this eBook: R for Data Science

7.1.1 Getting started

Once you have started R, there are several ways to find help. First of all, (almost) every command is equipped with a help page that can be accessed via ?... (if the package is loaded). If the command is part of a package that is not loaded or you have no clue about the command itself, you can search the entire help (full-text) by using ??.... Be aware, that certain very-high level commands need to be put in quotation marks ?'function'. Many of the packages you find are either equipped with a demo() (get a list of all available demos using demo(package=.packages(all.available = TRUE))) and/or a vignette(), a document explaining the purpose of the package and demonstrating its work using suitable examples (find all available vignettes with vignette(package=.packages(all.available = TRUE))). If you want to learn how to do a certain task (e.g. conducting an event study vignette("eventstudies")¹¹).

Executing code in Rstudio is simple. Either you highlight the exact portion of the code that you want to execute and hit ctrl+enter, or you place the cursor just somewhere to execute this particular line of code with the same command.¹²

7.1.2 Working directory

Before we start to learn how to program, we have to set a working directory. First, create a folder “researchmethods” (preferably never use directory names containing special characters or empty spaces) somewhere on citrix/your laptop, this will be your working directory where R looks for code, files to load and saves everything that is not designated by a full path (e.g. “D:/R/LAB/SS2018/…”). Note: In contrast to windows paths you have to use either “/” instead of “" or use two”\“. Now set the working directory using setwd() and check with getwd()

setwd("D:/R/researchmethods")
getwd()

7.1.3 Basic calculations

3+5; 3-5; 3*5; 3/5
# More complex including brackets
(5+3-1)/(5*10)
# is different to 
5+3-1/5*10
# power of a variable
4*4*4
4^300
# root of a variable
sqrt(16)
16^(1/2)
16^0.5
# exponential and logarithms
exp(3)
log(exp(3))
exp(1)
# Log to the basis of 2
log2(8)
2^log2(8)
# raise the number of digits shown
options(digits=6)
exp(1)
# Rounding
20/3
round(20/3,2)
floor(20/3)
ceiling(20/3)

7.1.4 Mapping variables

Defining variables (objects) in R is done via the arrow operator <- that works in both directions ->. Sometimes you will see someone use the equal sign = but for several (more complex) reasons, this is not advisable.

n <- 10
n
n <- 11
n
12 -> n
n
n <- n^2
n

In the last case, we overwrite a variable recursively. You might want to do that for several reasons, but I advise you to rarely do that. The reason is that - depending on how often you have executed this part of the code already - n will have a different value. In addition, if you are checking the output of some calculation, it is not nice if one of the input variables always has a different value.

In a next step, we will check variables. This is a very important part of programming.

# check if m==10
m <- 11
m==10 # is equal to
m==11
m!=11 # is not equal to
m>10  # is larger than
m<10  # is smaller than
m<=11 # is smaller or equal than
m>=12 # is larger or equal than

If one wants to find out which variables are already set use ls(). Delete (Remove) variables using rm() (you sometimes might want to do that to save memory - in this case always follow the rm() command with gc()).

ls()  # list variables
rm(m) # remove m
ls()  # list variables again (m is missing)

Of course, often we do not only want to store numbers but also characters. In this case enclose the value by quotation marks: name <- "test". If you want to check whether a variable has a certain format use available commands starting with is.. If you want to change the format of a variable use as.

name <- "test"
is.numeric(n)
is.numeric(name)
is.character(n)
is.character(name)

If you do want to find out the format of a variable you can use class(). Slightly different information will be given by mode() and typeof()

class(n)
class(name)
mode(n)
mode(name)
typeof(n)
typeof(name)
# Lets change formats:
n1 <- n
is.character(n1)
n1 <- as.character(n)
is.character(n1)
as.numeric(name) # New thing: NA

Before we learn about NA, we have to define logical variables that are very important when programming (e.g., as options in a function). Logical (boolean) variables will either assume TRUE or FALSE.

# last but not least we need boolean (logical) variables
n2 <- TRUE
is.numeric(n2)
class(n2)
is.logical(n2)
as.logical(2) # all values except 0 will be converted to TRUE
as.logical(0)

Now we can check whether a condition holds true. In this case, we check if m is equal to 10. The output (as you have seen before) is of type logical.

is.logical(n==10)
n3 <- n==10 # we can assign the logical output to a new variable
is.logical(n3)

Assignment: Create numeric variable x, set x equal to 5/3. What happens if you divide by 0? By Inf? Set y<-NA. What could this mean? Check if the variable is “na”. Is Inf numeric? Is NA numeric?

7.1.5 Sequences, vectors and matrices

In this chapter, we are going to learn about higher-dimensional objects (storing more information than just one number).

7.1.5.1 Sequences

We define sequences of elements (numbers/characters/logicals) via the concatenation operator c() and assign them to a variable. If one of the elements of a sequence is of type character, the whole sequence will be converted to character, else it will be of type numeric (for other possibilities check the help ?vector). At the same type it will be of the type vector.

x <- c(1,3,5,6,7)
class(x)
is.vector(x)
is.numeric(x)

To create ordered sequences make use of the command seq(from,to,by). Please note that often programmers are lazy and just write seq(1,10,2) instead of seq(from=1,to=10,by=2). However it makes code much harder to understand, can produce unintended results, and if a function is changed (which happens as R is always under construction) yield something very different to what was intended. Therefore I strongly encourage you to always specify the arguments of a function by name. To do this I advise you to make use of the tab a lot. Tab helps you to complete commands, produces a list of different commands starting with the same letters (if you do not completely remember the spelling for example), helps you to find out about the arguments and even gives information about the intended/possible values of the arguments. A nice way and shortcut for creating ordered/regular sequences with distance (by=) one is given by the : operator: 1:10 is equal to seq(from=1,to=10,by=1).

x1 <- seq(from=1,to=5,by=1)
x2 <- 1:5

One can operate with sequences in the same way as with numbers. Be aware of the order of the commands and use brackets where necessary!

1:10-1
1:(10-1)
1:10^2-2 *3

Assignment: 1. Create a series from -1 to 5 with distances 0.5? Can you find another way to do it using the : operator and standard mathematical operations? 2. Create the same series, but this time using the “length”-option 3. Create 20 ones in a row (hint: find a function to do just that)

Of course, all logical operations are possible for vectors, too. In this case, the output is a vector of logicals having the same size as the input vector. You can check if a condition is true for any() or all() parts of the vector.

7.1.5.2 Random Sequences

One of the most important tasks of any programming language that is used for data analysis and research is the ability to generate random numbers. In R all the random number commands start with an r..., e.g. random normal numbers rnorm(). To find out more about the command use the help ?rnorm. All of these commands are a part of the stats package, where you find available commands using the package help: library(help=stats). Notice that whenever you generate random numbers, they are different. If you prefer to work with the same set of random numbers (e.g. for testing purposes) you can fix the starting value of the random number generator by setting the seed to a chosen number set.seed(123). Notice that you have to execute set.seed() every time before (re)using the random number generator.

rand1 <- rnorm(n = 100) # 100 standard normally distributed random numbers
set.seed(134) # fix the starting value of the random number generator (then it will always)
rand1a <- rnorm(n = 100)

Assignment: 1. Create a random sequence of 20 N(0,2)-distributed variables and assign it to the variable rand2. 2. Create a random sequence of 200 Uniform(-1,1) distributed variables and save to rand3. 3. What other distributions can you find in the stats package? 4. Use the functions mean and sd. Manipulate the random variables to have a different mean and standard deviation. Do you remember the normalization process (z-score)?

As in the last assignment you can use all the functions you learned about in statistics to calculate the mean(), the standard deviation sd(), skewness() and kurtosis() (the latter two after loading and installing the moments package). To install/load a package we use install.packages() (only once) and then load the package with require().

#install.packages("moments") # only once, no need to reinstall every time
require(moments)
mean(rand1a)
sd(rand1a)
skewness(rand1a)
kurtosis(rand1a)
summary(rand1a)

7.1.6 Vectors and matrices

We have created (random) sequences above and can determine their properties, such as their length(). We also know how to manipulate sequences through mathematical operations, such as +-*/^. If you want to calculate a vector product, R provides the %*% operator. In many cases (such as %*%) vectors behave like matrices, automating whether they should be row or column-vectors. However, to make this more explicit transform your vector into a matrix using as.matrix. Now, it has a dimension and the property matrix. You can transpose the matrix using t(), calculate its inverse using solve() and manipulate in any other way imaginable. To create matrices use matrix() and be careful about the available options!

x <- c(2,4,5,8,10,12)
length(x)
dim(x)
x^2/2-1
x %*% x # R automatically multiplies row and column vector
is.vector(x)
y <- as.matrix(x)
is.matrix(y); is.matrix(x)
dim(y); dim(x)
t(y) %*% y
y %*% t(y)
mat <- matrix(data = x,nrow = 2,ncol = 3, byrow = TRUE)
dim(mat); ncol(mat); nrow(mat)
mat2 <- matrix(c(1,2,3,4),2,2) # set a new (quadratic) matrix
mat2i <- solve(mat2)
mat2 %*% mat2i
mat2i %*% mat2

Assignment: 1. Create this matrix matrix(c(1,2,2,4),2,2) and try to calculate its inverse. What is the problem? Remember the determinant? Calculate using det(). What do you learn? 2. Create a 4x3 matrix of ones and/or zeros. Try to matrix-multiply with any of the vectors/matrices used before. 3. Try to add/subtract/multiply matrices, vectors and scalars.

A variety of special matrices is available, such as diagonal matrices using diag(). You can glue matrices together columnwise (cbind()) or rowwise (rbind()).

diag(3)
diag(c(1,2,3,4))
mat4 <- matrix(0,3,3)
mat5 <- matrix(1,3,3)
cbind(mat4,mat5)
rbind(mat4,mat5)

7.1.6.1 The indexing system

We can access the row/column elements of any object with at least one dimension using [].

########################################################
### 8) The INDEXING System
# We can access the single values of a vector/matrix
x[2] # one-dim
mat[,2] # two-dim column
mat[2,] # two-dim row
i <- c(1,3)
mat[i]
mat[1,2:3] # two-dim select second and third column, first row
mat[-1,] # two-dim suppress first row
mat[,-2] # two-dim suppress second column

Now we can use logical vectors/matrices to subset vectors/matrices. This is very useful for data mining.

mat>=5 # which elements are large or equal to 5?
mat[mat>=5] # What are these elements?
which(mat>=5, arr.ind = TRUE) # another way with more explicit information

We can do something even more useful and name the rows and columns of a matrix usingcolnames() and rownames().

colnames(mat) <- c("a","b","c")
rownames(mat) <- c("A","B")
mat["A",c("b","c")]

7.1.7 Functions in R

7.1.7.1 Useful Functions

Of course, there are thousands of functions available in R, especially through the use of packages. In the following you find a demo of the most useful ones.

x <- c(1,2,4,-1,2,8)  # example vector 1
x1 <- c(1,2,4,-1,2,8,NA,Inf) # example vector 2 (more complex)
sqrt(x) # square root of x
x^3 # x to the power of ...
sum(x) # sum of the elements of x
prod(x) # product of the elements of x
max(x) # maximum of the elements of x
min(x) # minimum of the elements of x
which.max(x) # returns the index of the greatest element of x
which.min(x) # returns the index of the smallest element of x
# statistical function - use rand1 and rand2 created before
range # returns the minimum and maximum of the elements of x
mean # mean of the elements of x
median # median of the elements of x
var # variance of the elements of x
sd # standard deviation of the elements of x
cor # correlation matrix of x
cov # covariance between x and y
cor # linear correlation between x and y
# more complex functions
round(x, n) # rounds the elements of x to n decimals
rev(x) # reverses the elements of x
sort(x) # sorts the elements of x in increasing order
rank(x) # ranks of the elements of x
log(x) # computes natural logarithms of x
cumsum(x) # a vector which ith element is the sum from x[1] to x[i]
cumprod(x) # id. for the product
cummin(x) # id. for the minimum
cummax(x) # id. for the maximum
unique(x) # duplicate elements are suppressed

7.1.7.2 More complex objects in R

Next to numbers, sequences/vectors and matrices R offers a variety of different and more complex objects that can stow more complex information than just numbers and characters (e.g. functions, output text. etc). The most important ones are data.frames (extended matrices) and lists. Check the examples below to see how to create these objects and how to access specific elements.

df <- data.frame(col1=c(2,3,4), col2=sin(c(2,3,4)), col3=c("a","b", "c"))
li <- list(x=c(2,3,4), y=sin(c(2,3,4)), z=c("a","b", "c","d","e"), fun=mean)
# to grab elements from a list or dataframe use $ or [[]]
df$col3; li$x # get variables
df[,"col3"]; li[["x"]] # get specific elements that can also be numbered
df[,3]; li[[1]]

Assignment: 1. Get the second entry of element y of list x

7.1.7.3 Create simple functions in R

To create our own functions in R we need to give them a name, determine necessary input variables and whether these variables should be pre-specified or not. I use a couple of examples to show how to do this below.

?"function" # "function" is such a high-level object that it is interpreted before the "help"-command

# 1. Let's create a function that squares an entry x and name it square
square <- function(x){x^2}
square(5)
square(c(1,2,3))

# 2. Let us define a function that returns a list of several different results (statistics of a random vector v)
stats <- function(v){
  v.m <- mean(v) # create a variable that is only valid in the function
  v.sd <- sd(v)
  v.var <- var(v)
  v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)
  return(v.output)
}
v <- rnorm(1000,mean=1,sd=5)
stats(v)
stats(v)$Mean

# 3. A function can have standard arguments.
### This time we also create a random vector within the function and use its length as an input
stats2 <- function(n,m=0,s=1){
  v <- rnorm(n,mean=m,sd=s)
  v.m <- mean(v) # create a variable that is only valid in the function
  v.sd <- sd(v)
  v.var <- var(v)
  v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)
  return(v.output)
}
stats2(1000000)
stats2(1000,m=1)
stats2(1000,m=1,s=10)
stats2(m=1) # what happens if an obligatory argument is left out?

Assignment: 1. Create a function that creates two random samples with length n and m from the normal and the uniform distribution resp., given the mean and sd for the first and min and max for the second distribution. The function shall then calculate the covariance-matrix and the correlation-matrix which it gives back in a named list.

7.1.8 Plotting

Plotting in R can be done very easily. Check the examples below to get a reference and idea about the plotting capabilities in R. A very good source for color names (that work in R) is (http://en.wikipedia.org/wiki/Web_colors).

?plot
?colors # very good source for colors: 
y1 <- rnorm(50,0,1)
plot(y1)
# set title, and x- and y-labels
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample")
# now make a line between elements, and color everything blue
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# if you want to save plots or open them in separate windows you can use x11, pdf, png, ...
?Devices
# x11 (opens seperate window)
x11(8,6)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# pdf
pdf("plot1.pdf",6,6)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',lty=2)
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
dev.off()
# more extensive example
X11(6,6)
par(mfrow=c(2,1),cex=0.9,mar=c(3,3,1,3)+0.1)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',lty=2,ylim=c(-2.5,2.5),lwd=2)
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
barplot(y1,col="blue") # making a barplot
# plotting a histogram
hist(y1) # there is a nicer version available once we get to time series analysis
# create a second sample
y2 <- rnorm(50)
# scatterplot
plot(y1,y2)
# boxplot
boxplot(y1,y2)

7.1.9 Control Structures

Last and least for this lecture we learn about control structure. These structures (for-loops, if/else checks etc) are very useful, if you want to translate a tedious manual task (e.g. in Excel) into something R should do for you and go step by step (e.g. column by column). Again, see below for a variety of examples and commands used in easy examples.

x <- sample(-15:15,10) # sample does draw randomly draw 10 numbers from the input vector -15:15
# 1. We square every element of vector x in a loop
y <- NULL # 1.a) set an empty variable (NULL means it is truly nothing and has no pre-specified dimension/length)
is.null(y)
# 1.b) Use an easy for-loop:
for (i in 1:length(x)){
  y[i] <- x[i]^2
}
# 2. Now we use an if-condition to only replace negative values
y <- NULL
for (i in 1:length(x)){
  y[i] <- x[i]
  if(x[i]<0) {y[i] <- x[i]^2}
}
# ASSIGNMENT: lets calculate the 100th square root of the square root of the square root ...
y <- rep(NA,101)
y[1] <- 500
for (i in 1:100){
  print(i)
  y[i+1] <- sqrt(y[i])
}
plot(y,type="l")

Bendtsen., Claus. 2012. Pso: Particle Swarm Optimization. https://CRAN.R-project.org/package=pso.

Gubian, Sylvain, Yang Xiang, Brian Suomela, Julia Hoeng, and PMP SA. 2018. GenSA: Generalized Simulated Annealing. https://CRAN.R-project.org/package=GenSA.

Markowitz, Harry. 1952. “Portfolio Selection.” The Journal of Finance 7 (1): 77–91.

Peterson, Brian G., and Peter Carl. 2018. PortfolioAnalytics: Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios. https://github.com/braverock/PortfolioAnalytics.

Würtz, Diethelm, Yohan Chalabi, William Chen, and Andrew Ellis. 2015. Portfolio Optimization with R/Rmetrics. Rmetrics.

If this command shows an error message you need to install the package first, see further down for how to do that.↩
Under certain circumstances - either using pipes or within loops - RStudio will execute the en tire loop/pipe structure. In this case you have to highlight the particular line that you want to execute.↩