Basic Terminology
Prerequisites
You will need to have both R and RStudio installed on your computer.
R from https://cran.r-project.org/.
An integrated development environment (IDE) (e.g., editor, build tools): RStudio: http://www.rstudio.com/.
R (contributed) packages can be installed via either of the following versions.
Release versions:
Comprehensive R Archive Network (CRAN): install.packages(“mypkg”).
Bioconductor: devtools::install_bioc(“mypackage”) or BiocManager::install(“mypackage”).
Development versions:
R-Forge: install.packages(“mypkg”, repos = “https://R-Forge.R-project.org”)).
GitHub: devtools::install_github(“maintainer/mypkg”).
To work with R,
create a script (a file, e.g. myscript.R) containing the R source code; and
run the script interactively by executing it line-by-line.
To run the current line, press
Control+Ror right-click and selectRun line or selection. If you want to understand a command, enter?commandin R console.
This is good practice: clear all objects from the memory before starting an R session. Just make sure you place the code below as the first line in each of your R scripts.
Creating Objects in R
In R, everything is an object — numbers, text, vectors, matrices, and even functions. You can assign values to objects using the assignment operator <-.
Scalars
A scalar is a single value (a number or a piece of text).
Run the next line, and the value of the object scalar1 will appear in the R Console.
## [1] 1
You can also create other types of scalars:
To check the type of any object:
## [1] "character"
Vectors
A vector is a sequence of elements of the same type (numeric, character, or logical).
Generating sequences
The : operator and the seq() function can be used to create numerical sequences.
##  [1]  1  2  3  4  5  6  7  8  9 10
## [1] 1 3 5 7 9
## [1] 1 3 5 7 9
Repeating values
The rep() function allows us to repeat elements or entire sequences.
## [1] 5 5
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
Matrices
A matrix is a two-dimensional array (rows × columns) with elements of the same type.
Creating matrices
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    8    8    8    8    8    8
## [2,]    8    8    8    8    8    8
## [3,]    8    8    8    8    8    8
## [4,]    8    8    8    8    8    8
Combine matrices and vectors
##         [,1] [,2] [,3] [,4] [,5]
##            1    5    9   13   17
##            2    6   10   14   18
##            3    7   11   15   19
##            4    8   12   16   20
## vector1    1    1    1    1    1
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    5    9   13   17    8
## [2,]    2    6   10   14   18    8
## [3,]    3    7   11   15   19    8
## [4,]    4    8   12   16   20    8
Matrix operations
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16
## [5,]   17   18   19   20
# Multiply two matrices (compatible dimensions required)
matrixA <- matrix(1:6, nrow = 2)
matrixB <- matrix(7:12, nrow = 3)
matrixA %*% matrixB  # matrix multiplication##      [,1] [,2]
## [1,]   76  103
## [2,]  100  136
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    2   10   18   26   34
## [2,]    4   12   20   28   36
## [3,]    6   14   22   30   38
## [4,]    8   16   24   32   40
Strings
Character data (text) is also a common object type in R. You can store single or multiple strings in a variable and manipulate them easily.
Creating and combining strings
## [1] "Hello!"
## [1] "Hello!"          "Happy New Year."
# Combine string1 and string2 with "---" in between
string3 <- paste(string2, collapse = "---") 
string3## [1] "Hello!---Happy New Year."
Naming rows and columns in a matrix
We can use strings to label matrix dimensions for easier interpretation.
## NULL
# Assign new column names
labels <- c("A", "B", "C", "D", "E")
colnames(matrix3) <- paste("Column", labels, sep = " ") 
colnames(matrix3)## [1] "Column A" "Column B" "Column C" "Column D"
## [5] "Column E"
## [1] "Row 1" "Row 2" "Row 3" "Row 4" "Row 5"
Basic Calculations
This section introduces basic mathematical operations in R, including scalar multiplication, transposition, inverse (elementwise), and matrix algebra. You will also learn how to perform both element-by-element operations and true matrix multiplication.
Transpose and scalar multiplication
You can multiply a matrix by a scalar, and you can also take the transpose of a matrix using the t() function.
##       Column A Column B Column C Column D
## Row 1        1        5        9       13
## Row 2        2        6       10       14
## Row 3        3        7       11       15
## Row 4        4        8       12       16
## Row 5        1        1        1        1
##       Column E
## Row 1       17
## Row 2       18
## Row 3       19
## Row 4       20
## Row 5        1
##          Row 1 Row 2 Row 3 Row 4 Row 5
## Column A     1     2     3     4     1
## Column B     5     6     7     8     1
## Column C     9    10    11    12     1
## Column D    13    14    15    16     1
## Column E    17    18    19    20     1
Inverse
When you raise a matrix to a negative power using ^-1, R performs the operation elementwise, not as a true matrix inverse.
##        Column A  Column B   Column C   Column D
## Row 1 1.0000000 0.2000000 0.11111111 0.07692308
## Row 2 0.5000000 0.1666667 0.10000000 0.07142857
## Row 3 0.3333333 0.1428571 0.09090909 0.06666667
## Row 4 0.2500000 0.1250000 0.08333333 0.06250000
## Row 5 1.0000000 1.0000000 1.00000000 1.00000000
##         Column E
## Row 1 0.05882353
## Row 2 0.05555556
## Row 3 0.05263158
## Row 4 0.05000000
## Row 5 1.00000000
Element-by-element operations
You can perform mathematical functions on all elements of a matrix at once. This is called vectorisation — R automatically applies the function to each element.
##       Column A Column B Column C Column D
## Row 1 1.000000 2.236068 3.000000 3.605551
## Row 2 1.414214 2.449490 3.162278 3.741657
## Row 3 1.732051 2.645751 3.316625 3.872983
## Row 4 2.000000 2.828427 3.464102 4.000000
## Row 5 1.000000 1.000000 1.000000 1.000000
##       Column E
## Row 1 4.123106
## Row 2 4.242641
## Row 3 4.358899
## Row 4 4.472136
## Row 5 1.000000
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    5   10   15   20
## [2,]    1    6   11   16   21
## [3,]    2    7   12   17   22
## [4,]    3    8   13   18   23
## [5,]    4    9   14   19   24
##       Column A Column B Column C Column D
## Row 1        0       25       90      195
## Row 2        2       36      110      224
## Row 3        6       49      132      255
## Row 4       12       64      156      288
## Row 5        4        9       14       19
##       Column E
## Row 1      340
## Row 2      378
## Row 3      418
## Row 4      460
## Row 5       24
Matrix algebra
The %*% operator performs standard matrix multiplication. This is different from elementwise multiplication (*).
##       [,1] [,2] [,3] [,4] [,5]
## Row 1  130  355  580  805 1030
## Row 2  140  390  640  890 1140
## Row 3  150  425  700  975 1250
## Row 4  160  460  760 1060 1360
## Row 5   10   35   60   85  110
Control Flow
Control flow statements allow you to make decisions and repeat tasks. We’ll look at conditional statements (if) and two types of loops (for and while).
if Statement
An if statement runs a block of code only if a condition is true. Use else to specify an alternative action.
## [1] "Scalar smaller or equal 4"
for Loops
for loops are useful when you want to iterate over rows or columns of a matrix, or repeat a fixed number of operations.
The code below constructs a new matrix (matrix5) whose entries are calculated using a formula involving the indices i and j.
matrix5 <- matrix(0, nrow(matrix4), ncol(matrix4))
for (i in 1:nrow(matrix4)) {
  for (j in 1:ncol(matrix4)) {
    matrix5[i, j] <- matrix4[i, j] + i * j
  }
}
matrix5##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    7   13   19   25
## [2,]    3   10   17   24   31
## [3,]    5   13   21   29   37
## [4,]    7   16   25   34   43
## [5,]    9   19   29   39   49
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    2    4    6    8   10
## [3,]    3    6    9   12   15
## [4,]    4    8   12   16   20
## [5,]    5   10   15   20   25
while Loops
A while loop continues running as long as a condition remains true. Here we replicate the for loop using nested while loops.
matrix6 <- matrix(0, nrow(matrix4), ncol(matrix4))
i <- 1
while (i <= nrow(matrix4)) {
  j <- 1
  while (j <= ncol(matrix4)) {
    matrix6[i, j] <- matrix4[i, j] + i * j
    j <- j + 1
  }
  i <- i + 1
}
matrix5 - matrix6##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0
Random Number Generation
Random number generation is widely used in simulation and Monte Carlo methods. R provides many functions to draw from standard distributions, calculate probabilities, and generate reproducible random samples.
Normal distribution
Generating random values
Use rnorm() to generate random numbers from a normal distribution.
## [1]  0.08237627  0.58868915 -0.12874060
## [4]  0.04999958 -1.64991674
## [1]  8.17287509 -0.03845795  6.06127875
## [4]  9.78780107  4.85550440
Student-t distribution
Use the rt(), dt(), pt(), and qt() functions for the Student’s t distribution.
# Set degrees of freedom
dof <- 4
# Draw 5 random values from a Student t distribution 
rt(5, df = dof) ## [1]  0.9616083 -0.2697091 -0.1342671  1.2530546
## [5]  0.5825625
## [1]  7.416752  8.600067 11.598013  4.826010
## [5] 10.110188
## [1] 0.06968985
## [1] 0.06077732
## [1] -2.131847
Sampling with and without replacement
Use the sample() function to select random elements from a sequence.
##  [1]  6  7  5  8  2  4  1  9  3 10
##  [1]  3  7  7  5  4  3  4 10  8  5
Controlling random draws
Random numbers in R are pseudo-random — they are generated deterministically based on an internal seed. Use set.seed() to make your random results reproducible.
## [1] -0.84085548  1.38435934 -1.25549186
## [4]  0.07014277  1.71144087
## [1] -0.84085548  1.38435934 -1.25549186
## [4]  0.07014277  1.71144087
## [1] -0.6029080 -0.4721664 -0.6353713 -0.2857736
## [5]  0.1381082
Data and Dates
R includes many built-in datasets that you can explore. This section demonstrates how to load a dataset, inspect it, and work with time series data.
Built-in datasets
##    Tree  age circumference
## 1     1  118            30
## 2     1  484            58
## 3     1  664            87
## 4     1 1004           115
## 5     1 1231           120
## 6     1 1372           142
## 7     1 1582           145
## 8     2  118            33
## 9     2  484            69
## 10    2  664           111
## 11    2 1004           156
## 12    2 1231           172
## 13    2 1372           203
## 14    2 1582           203
## 15    3  118            30
## 16    3  484            51
## 17    3  664            75
## 18    3 1004           108
## 19    3 1231           115
## 20    3 1372           139
## 21    3 1582           140
## 22    4  118            32
## 23    4  484            62
## 24    4  664           112
## 25    4 1004           167
## 26    4 1231           179
## 27    4 1372           209
## 28    4 1582           214
## 29    5  118            30
## 30    5  484            49
## 31    5  664            81
## 32    5 1004           125
## 33    5 1231           142
## 34    5 1372           174
## 35    5 1582           177
Listing objects in memory
You can use ls() to see all objects currently stored in your R environment.
##  [1] "dof"                "i"                 
##  [3] "j"                  "labels"            
##  [5] "matrix1"            "matrix2"           
##  [7] "matrix3"            "matrix4"           
##  [9] "matrix5"            "matrix6"           
## [11] "matrixA"            "matrixB"           
## [13] "rand.norm"          "repeated.sequence1"
## [15] "repeated.sequence2" "repeated1"         
## [17] "scalar1"            "scalar2"           
## [19] "scalar3"            "scalar4"           
## [21] "sequence1"          "sequence2"         
## [23] "sequence3"          "sequence4"         
## [25] "string1"            "string2"           
## [27] "string3"            "v1"                
## [29] "v2"                 "vector1"           
## [31] "your.choice"
Converting data to time series
Here we convert part of the Orange dataset into a time series object using the ts() function.
## Time Series:
## Start = 1995 
## End = 2000 
## Frequency = 1 
##      Tree  age circumference
## 1995    2  118            30
## 1996    2  484            58
## 1997    2  664            87
## 1998    2 1004           115
## 1999    2 1231           120
## 2000    2 1372           142
Exploring time series
You can inspect the start and end times, calculate differences, and create lagged versions of the data.
## [1] 1995    1
## [1] 2000    1
## Time Series:
## Start = 1996 
## End = 2000 
## Frequency = 1 
## [1] 28 29 28  5 22
## Time Series:
## Start = 1993 
## End = 1998 
## Frequency = 1 
## [1]  30  58  87 115 120 142
## Time Series:
## Start = 1993 
## End = 2000 
## Frequency = 1 
##      Orange1TS.Tree Orange1TS.age
## 1993             NA            NA
## 1994             NA            NA
## 1995              2           118
## 1996              2           484
## 1997              2           664
## 1998              2          1004
## 1999              2          1231
## 2000              2          1372
##      Orange1TS.circumference diff(Orange1TS[, 3])
## 1993                      NA                   NA
## 1994                      NA                   NA
## 1995                      30                   NA
## 1996                      58                   28
## 1997                      87                   29
## 1998                     115                   28
## 1999                     120                    5
## 2000                     142                   22
##      lag(Orange1TS[, 3], 2)
## 1993                     30
## 1994                     58
## 1995                     87
## 1996                    115
## 1997                    120
## 1998                    142
## 1999                     NA
## 2000                     NA
Writing Functions
Defining your own functions in R makes it easier to reuse code and keep your analysis tidy. A function typically has three parts:
- Inputs (arguments),
 - Operations (what the function does), and
 - Outputs (what it returns).
 
Example: summarising and plotting a matrix
The following function plots all columns of a matrix on the same graph and returns summary statistics.
summarize.matrix <- function(mat) {
  # Plots columns of a matrix and returns summary statistics
  nc <- ncol(mat)
  dev.new()
  plot(mat[, 1], type = "l", ylim = c(min(mat), max(mat))) 
  # Plot the first column using lines; set y-axis range
  if (nc > 1) for (j in 2:nc) lines(mat[, j], col = j)
  legend("bottomleft", paste("Column", 1:nc, sep = " "), col = 1:nc, lty = 1, cex = .8)
  return(summary(mat))
}
# Apply the function to several matrices
summarize.matrix(matrix1)##        V1             V2             V3       
##  Min.   :1.00   Min.   :5.00   Min.   : 9.00  
##  1st Qu.:1.75   1st Qu.:5.75   1st Qu.: 9.75  
##  Median :2.50   Median :6.50   Median :10.50  
##  Mean   :2.50   Mean   :6.50   Mean   :10.50  
##  3rd Qu.:3.25   3rd Qu.:7.25   3rd Qu.:11.25  
##  Max.   :4.00   Max.   :8.00   Max.   :12.00  
##        V4              V5       
##  Min.   :13.00   Min.   :17.00  
##  1st Qu.:13.75   1st Qu.:17.75  
##  Median :14.50   Median :18.50  
##  Mean   :14.50   Mean   :18.50  
##  3rd Qu.:15.25   3rd Qu.:19.25  
##  Max.   :16.00   Max.   :20.00
##        V1          V2          V3          V4   
##  Min.   :8   Min.   :8   Min.   :8   Min.   :8  
##  1st Qu.:8   1st Qu.:8   1st Qu.:8   1st Qu.:8  
##  Median :8   Median :8   Median :8   Median :8  
##  Mean   :8   Mean   :8   Mean   :8   Mean   :8  
##  3rd Qu.:8   3rd Qu.:8   3rd Qu.:8   3rd Qu.:8  
##  Max.   :8   Max.   :8   Max.   :8   Max.   :8  
##        V5          V6   
##  Min.   :8   Min.   :8  
##  1st Qu.:8   1st Qu.:8  
##  Median :8   Median :8  
##  Mean   :8   Mean   :8  
##  3rd Qu.:8   3rd Qu.:8  
##  Max.   :8   Max.   :8
##     Column A      Column B      Column C   
##  Min.   :1.0   Min.   :1.0   Min.   : 1.0  
##  1st Qu.:1.0   1st Qu.:5.0   1st Qu.: 9.0  
##  Median :2.0   Median :6.0   Median :10.0  
##  Mean   :2.2   Mean   :5.4   Mean   : 8.6  
##  3rd Qu.:3.0   3rd Qu.:7.0   3rd Qu.:11.0  
##  Max.   :4.0   Max.   :8.0   Max.   :12.0  
##     Column D       Column E 
##  Min.   : 1.0   Min.   : 1  
##  1st Qu.:13.0   1st Qu.:17  
##  Median :14.0   Median :18  
##  Mean   :11.8   Mean   :15  
##  3rd Qu.:15.0   3rd Qu.:19  
##  Max.   :16.0   Max.   :20
##        V1          V2          V3    
##  Min.   :0   Min.   :5   Min.   :10  
##  1st Qu.:1   1st Qu.:6   1st Qu.:11  
##  Median :2   Median :7   Median :12  
##  Mean   :2   Mean   :7   Mean   :12  
##  3rd Qu.:3   3rd Qu.:8   3rd Qu.:13  
##  Max.   :4   Max.   :9   Max.   :14  
##        V4           V5    
##  Min.   :15   Min.   :20  
##  1st Qu.:16   1st Qu.:21  
##  Median :17   Median :22  
##  Mean   :17   Mean   :22  
##  3rd Qu.:18   3rd Qu.:23  
##  Max.   :19   Max.   :24
##        V1          V2           V3    
##  Min.   :1   Min.   : 7   Min.   :13  
##  1st Qu.:3   1st Qu.:10   1st Qu.:17  
##  Median :5   Median :13   Median :21  
##  Mean   :5   Mean   :13   Mean   :21  
##  3rd Qu.:7   3rd Qu.:16   3rd Qu.:25  
##  Max.   :9   Max.   :19   Max.   :29  
##        V4           V5    
##  Min.   :19   Min.   :25  
##  1st Qu.:24   1st Qu.:31  
##  Median :29   Median :37  
##  Mean   :29   Mean   :37  
##  3rd Qu.:34   3rd Qu.:43  
##  Max.   :39   Max.   :49
Example: applying the function to Orange
You can also use the same function on time series data.
##       Tree        age         circumference   
##  Min.   :2   Min.   : 118.0   Min.   : 30.00  
##  1st Qu.:2   1st Qu.: 529.0   1st Qu.: 65.25  
##  Median :2   Median : 834.0   Median :101.00  
##  Mean   :2   Mean   : 812.2   Mean   : 92.00  
##  3rd Qu.:2   3rd Qu.:1174.2   3rd Qu.:118.75  
##  Max.   :2   Max.   :1372.0   Max.   :142.00
More
Much of what you need in R has already been programmed and is available in packages. Some packages need to be installed first, but the common ones are already installed on LSE computers.
You can load a package using library("packagename"). For example,
Exploring functions
You can see the code of a user-defined function or a function from a package by typing its name:
## function (mat) 
## {
##     nc <- ncol(mat)
##     dev.new()
##     plot(mat[, 1], type = "l", ylim = c(min(mat), max(mat)))
##     if (nc > 1) 
##         for (j in 2:nc) lines(mat[, j], col = j)
##     legend("bottomleft", paste("Column", 1:nc, sep = " "), col = 1:nc, 
##         lty = 1, cex = 0.8)
##     return(summary(mat))
## }
## <bytecode: 0x000001a929e95e48>
## <environment: 0x000001a9237f3e38>
## function (x, a = 0.5, reference = c("normal", "exp", "student"), 
##     ...) 
## {
##     n <- length(x)
##     reference <- match.arg(reference)
##     plot.points <- ppoints(n, a)
##     func <- switch(reference, normal = qnorm, exp = qexp, student = qt)
##     xp <- func(plot.points, ...)
##     y <- sort(x)
##     plot(xp, y, xlab = paste("Theoretical", reference), ylab = "Empirical")
##     invisible(list(x = x, y = y))
## }
## <bytecode: 0x000001a92e22ec20>
## <environment: namespace:QRM>
For more information, references, or examples, you can use the help command by adding a ? in front of the function name. Try ?QQplot.
You can also try running the function directly:
# Example: QQ plot for testing exponential data against the exponential distribution
QQplot(rexp(1000), reference = "exp", rate = 0.3) 
Further reading
There is a lot of free material online for learning R and good coding practices. Some recommended resources include:
Introduction to R: http://cran.r-project.org/doc/manuals/R-intro.pdf
On good coding style: short - http://adv-r.had.co.nz/Style.html
… detailed: http://cran.r-project.org/web/packages/rockchalk/vignettes/Rstyle.pdf