Data structure
Vectors
A vector is a collection of values that all have the same data type. It can be a numeric or character vector depending on the data type of the elements.
## [1] 1 2 3 4 5 6 7 8
## [1] "red" "yellow" "blue" "red" "blue"
## [1] TRUE
## [1] TRUE
You can also select a specific elements in the vector. Below are some examples.
## [1] 3
## [1] "red" "red"
Factors
A factor can be viewed as a special case of a vector. We usually use factors to represent categorical data (which has a fixed set of possible values). A set of possible categories in the data is referred to as levels of the factor.
## [1] small large small medium medium
## Levels: large medium small
## [1] TRUE
## [1] FALSE
## [1] TRUE
Matrices
A matrix is a two-dimensional generalization of a vector. The values are arranged in rows and columns, and the elements must have the same data type.
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
## [1] TRUE
You can also combine vectors by rows or columns to create a matrix .
## a b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15
## [1] TRUE
## [,1] [,2] [,3] [,4] [,5]
## a 1 2 3 4 5
## b 11 12 13 14 15
## [1] TRUE
To select specific elements in the matrix, you can do:
## b
## 13
## a b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## a b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15
Dataframes
A dataframe is a collection of vectors with the same length (but they can be of different data types). We usually use a dataframe to represent an entire dataset.
dataset <- data.frame(shape=c("circle", "triangle", "rectangle", "circle", "circle"),
size = size, color = colors,
score = c(5, 4, 2, 9, 8))
dataset
## shape size color score
## 1 circle small red 5
## 2 triangle large yellow 4
## 3 rectangle small blue 2
## 4 circle medium red 9
## 5 circle medium blue 8
## [1] TRUE
## [1] FALSE
## [1] TRUE
To select a specific variable (vector) or a subset of the dataframe, do:
## [1] red
## Levels: blue red yellow
## [1] red yellow blue red blue
## Levels: blue red yellow
## NULL
## [1] "circle" "triangle" "rectangle" "circle" "circle"
## [1] "rectangle"
## shape size color score
## 4 circle medium red 9
## 5 circle medium blue 8
Lists
A list is a collection of data objects. The components can have different data types and lengths.
all_combined <- list(names = c("Bob", "Anne"),
age = c(26, 43),
numbers = numbers, samples = dataset)
all_combined
## $names
## [1] "Bob" "Anne"
##
## $age
## [1] 26 43
##
## $numbers
## [1] 1 2 3 4 5 6 7 8
##
## $samples
## shape size color score
## 1 circle small red 5
## 2 triangle large yellow 4
## 3 rectangle small blue 2
## 4 circle medium red 9
## 5 circle medium blue 8
## [1] TRUE
You can extract specific components/elements of the list in various ways.
## [1] "Bob" "Anne"
## [1] "Bob" "Anne"
## [1] "Bob" "Anne"
## [1] "Bob"
## [1] "Anne"
## [1] 26