4 Fundamentals
This is Chapter covers the basics of R and RStudio
4.1 Syntax
Plain text End a line with two spaces to start a new paragraph. italics and italics bold and bold superscript2 strikethrough link # Header 1 (#) ## Header 2 (##) ### Header 3 (###) #### Header 4 (####) ##### Header 5 (#####) ###### Header 6 (######) endash: -- emdash: --- ellipsis: ... inline equation: \(A = \pi*r^{2}\) image:
horizontal rule (or slide break): ** > block quote unordered list * item 2 + sub-item 1 + sub-item 2
- ordered list
- item 2
- sub-item 1
- sub-item 2 Table Header | Second Header ------------- | ------------- Table Cell | Cell 2 Cell 3 | Cell 4
4.2 Vectors
A vector is a collection of elements of the same mode :
v.n <- c(3,4,5,6, NA)
v.c <- c("Tom","Jim","Tim")
v.l <- c(TRUE,TRUE,FALSE)
#Missing value is coded as NA
We can create a vector by using the c function (concatenation), or functions seq & rep
v.n1 <- rep(2, 4)
v.n2 <- rep(v.n, 4)
v.n3 <- rep(v.n, each=4)
v.n4 <- seq(from=3, to=10, length=10)
v.n5 <- seq(from=3, to=10, by=0.5)
v.n6 <- 1:10
4.3 Matrix
The R function matrix creates a matrix :
m1 <- matrix(rnorm(12), 3, 4)
Other functions for creating a matrix
v1 <- runif(10)
v2 <- rnorm(10)
m1 <- cbind(v1, v2)
m2 <- rbind(v1, v2)
4.4 Dataframe
Data frame is probably the most commonly used data object. It is in the form of a matrix but with a mode of list. Each column is a variable, each row is an observation. A column can be numeric, characters, or logic. Each column has its unique name
m1 <- matrix(rnorm(6),2,3,byrow=T)
m2 <- rbind(m1,c(1,1,2))
m2 <- cbind(m2, c(1,1,2))
d1 <- data.frame(m2)
d2 <- data.frame(v1=rnorm(3), v2=runif(3))
names(d2)
## [1] "v1" "v2"
A data frame is a list. A list is a collection of elements (just like a vector),but the elements of a list can be of different mode :
l1 <- list(c(1,2,3), matrix(rnorm(9), 3, 3),
c("Tim","Tom","Jim"))
l1
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 1.2735046 1.2839747 -1.9313985
## [2,] -0.4933836 -1.1187934 0.6498940
## [3,] 1.2601138 0.0290398 0.1638518
##
## [[3]]
## [1] "Tim" "Tom" "Jim"
A data frame is a list of vectors of the same length. When importing from a spreadsheet file, the default format is data frame.
4.5 Importing Data
The basic R function for reading text data is scan. The most useful function is read.table or read.csv. When using read.table the text file is imported into a data frame.
4.5.1 Extract elements
s<-v[2] #extracts the second element of v and stores it to s. v2<-v[2:3] # extracts the 2nd and 3rd elements v3<-v[c(1,3,4)] v4<-m[,2] # putting the 2nd column of matrix or data frame to vector v4 v4
4.5.2 Read data from internet
Fixed width format and read.fwf:
read.fwf(file, widths, header = FALSE, sep = "",skip = 0, row.names, col.names, n = -1, buffersize = 2000, ...)
4.6 Types of data
4.6.1 Variables
- Integer (ex. 100)
- Numeric (ex. 0.05)
- Character (ex. "hello")
- Logical (ex. TRUE)
- Factor (ex. "Green")
4.6.2 Types of data objects
- Vector
- Matrix
- List and dataframe
- Array
4.6.2.1 Numeric vectors
x <- c(2, 6, 1, 5, 2.5)
y <- c(0, 6, 3, 2.6, 9.4)
x[3] #Accessing elements of a vector
## [1] 1
4.6.2.2 Vector operations
z <- x + y #sum by elements
z[2] #the second element
## [1] 12
z[-2] #all but the second element
## [1] 2.0 4.0 7.6 11.9
z[c(2,4)] #the second and the fourth elements
## [1] 12.0 7.6
z[c(2:4)] #elements 2 to 4
## [1] 12.0 4.0 7.6
z[-c(2:4)] #all except elements 2 to 4
## [1] 2.0 11.9
4.6.2.3 Vector of logic values:
TRUE and FALSE are logic values
z>10 #compares each element of z to 10 and returns a vector of logic values
## [1] FALSE TRUE FALSE FALSE TRUE
4.6.2.4 Logic comparisons
- <
- ==
- <=
=
- xor
x>y #element-wise comparison
## [1] TRUE FALSE FALSE TRUE FALSE
4.7 Subsetting vectors
names <- c("oliver", "olivia", "henry","mary",) sex <- c("M","F","M","F",) speed <- c(3.5, 4, 3, 3.25,) names[sex=="F",] speed[sex=="M",] names[speed<=3.5,] z[z>10]
4.8 Sorting data
order(names) order(speed)
Names[order(speed)] Names[order(Names)]
4.9 Merging vectors
z <- c(x, y) #Adding columns
4.10 Generating vectors
c(3,5,6) #3 5 6 2:5 #2 3 4 5 seq(2, 3, by=0.5) #2.0 2.5 3.0 rep(1:2,each=3) #1 1 1 2 2 2 rep(1:2, times=3) #1 2 1 2 1 2