4 Fundamentals

This is Chapter covers the basics of R and RStudio

4.1 Syntax

Plain text End a line with two spaces to start a new paragraph. italics and italics bold and bold superscript2 strikethrough link # Header 1 (#) ## Header 2 (##) ### Header 3 (###) #### Header 4 (####) ##### Header 5 (#####) ###### Header 6 (######) endash: -- emdash: --- ellipsis: ... inline equation: \(A = \pi*r^{2}\) image:

horizontal rule (or slide break): ** > block quote unordered list * item 2 + sub-item 1 + sub-item 2

  1. ordered list
  2. item 2
  • sub-item 1
  • sub-item 2 Table Header | Second Header ------------- | ------------- Table Cell | Cell 2 Cell 3 | Cell 4

4.2 Vectors

A vector is a collection of elements of the same mode :

v.n <- c(3,4,5,6, NA)
v.c <- c("Tom","Jim","Tim")
v.l <- c(TRUE,TRUE,FALSE)
#Missing value is coded as NA

We can create a vector by using the c function (concatenation), or functions seq & rep

v.n1 <- rep(2, 4)
v.n2 <- rep(v.n, 4)
v.n3 <- rep(v.n, each=4)
v.n4 <- seq(from=3, to=10, length=10)
v.n5 <- seq(from=3, to=10, by=0.5)
v.n6 <- 1:10

4.3 Matrix

The R function matrix creates a matrix :

m1 <- matrix(rnorm(12), 3, 4)

Other functions for creating a matrix

v1 <- runif(10)
v2 <- rnorm(10)
m1 <- cbind(v1, v2)
m2 <- rbind(v1, v2)

4.4 Dataframe

Data frame is probably the most commonly used data object. It is in the form of a matrix but with a mode of list. Each column is a variable, each row is an observation. A column can be numeric, characters, or logic. Each column has its unique name

m1 <- matrix(rnorm(6),2,3,byrow=T)
m2 <- rbind(m1,c(1,1,2))
m2 <- cbind(m2, c(1,1,2))

d1 <- data.frame(m2)
d2 <- data.frame(v1=rnorm(3), v2=runif(3))
names(d2)
## [1] "v1" "v2"

A data frame is a list. A list is a collection of elements (just like a vector),but the elements of a list can be of different mode :

l1 <- list(c(1,2,3), matrix(rnorm(9), 3, 3),
c("Tim","Tom","Jim"))
l1
## [[1]]
## [1] 1 2 3
## 
## [[2]]
##            [,1]       [,2]       [,3]
## [1,]  1.2735046  1.2839747 -1.9313985
## [2,] -0.4933836 -1.1187934  0.6498940
## [3,]  1.2601138  0.0290398  0.1638518
## 
## [[3]]
## [1] "Tim" "Tom" "Jim"

A data frame is a list of vectors of the same length. When importing from a spreadsheet file, the default format is data frame.

4.5 Importing Data

The basic R function for reading text data is scan. The most useful function is read.table or read.csv. When using read.table the text file is imported into a data frame.

4.5.1 Extract elements

s<-v[2] #extracts the second element of v and stores it to s. v2<-v[2:3] # extracts the 2nd and 3rd elements v3<-v[c(1,3,4)] v4<-m[,2] # putting the 2nd column of matrix or data frame to vector v4 v40] #v5 has all positive values of v v6<-v[!is.na(v)] #all non-missing values

4.5.2 Read data from internet

Fixed width format and read.fwf:

read.fwf(file, widths, header = FALSE, sep = "",skip = 0, row.names, col.names, n = -1, buffersize = 2000, ...)

4.6 Types of data

4.6.1 Variables

  1. Integer (ex. 100)
  2. Numeric (ex. 0.05)
  3. Character (ex. "hello")
  4. Logical (ex. TRUE)
  5. Factor (ex. "Green")

4.6.2 Types of data objects

  1. Vector
  2. Matrix
  3. List and dataframe
  4. Array

4.6.2.1 Numeric vectors

x <- c(2, 6, 1, 5, 2.5)
y <- c(0, 6, 3, 2.6, 9.4)

x[3] #Accessing elements of a vector
## [1] 1

4.6.2.2 Vector operations

z <- x + y #sum by elements
z[2] #the second element
## [1] 12
z[-2] #all but the second element
## [1]  2.0  4.0  7.6 11.9
z[c(2,4)] #the second and the fourth elements
## [1] 12.0  7.6
z[c(2:4)] #elements 2 to 4
## [1] 12.0  4.0  7.6
z[-c(2:4)] #all except elements 2 to 4
## [1]  2.0 11.9

4.6.2.3 Vector of logic values:

TRUE and FALSE are logic values

z>10 #compares each element of z to 10 and returns a vector of logic values
## [1] FALSE  TRUE FALSE FALSE  TRUE

4.6.2.4 Logic comparisons

  1. <
  2. ==
  3. <=
  4. =

  5. xor
x>y #element-wise comparison
## [1]  TRUE FALSE FALSE  TRUE FALSE

4.7 Subsetting vectors

names <- c("oliver", "olivia", "henry","mary",) sex <- c("M","F","M","F",) speed <- c(3.5, 4, 3, 3.25,) names[sex=="F",] speed[sex=="M",] names[speed<=3.5,] z[z>10]

4.8 Sorting data

order(names) order(speed)

Names[order(speed)] Names[order(Names)]

4.9 Merging vectors

z <- c(x, y) #Adding columns

4.10 Generating vectors

c(3,5,6) #3 5 6 2:5 #2 3 4 5 seq(2, 3, by=0.5) #2.0 2.5 3.0 rep(1:2,each=3) #1 1 1 2 2 2 rep(1:2, times=3) #1 2 1 2 1 2