## Data structure

### Vectors

A vector is a collection of values that all have the same data type. It can be a numeric or character vector depending on the data type of the elements.

``numbers <- 1:8  ; numbers   # numeric vector``
``##  1 2 3 4 5 6 7 8``
``colors <- c("red", "yellow", "blue", "red", "blue")  ; colors    # character vector``
``##  "red"    "yellow" "blue"   "red"    "blue"``
``is.vector(numbers)     # returns TRUE if 'numbers' is a vector``
``##  TRUE``
``is.vector(colors)      # returns TRUE if 'colors' is a vector``
``##  TRUE``
``# as.vector(object) : this attempts to coerce 'object' into a vector``

You can also select a specific elements in the vector. Below are some examples.

``numbers     # selects the third element in 'numbers'``
``##  3``
``colors[c(1, 4)]    # selects the first and fourth elements in 'colors'``
``##  "red" "red"``

### Factors

A factor can be viewed as a special case of a vector. We usually use factors to represent categorical data (which has a fixed set of possible values). A set of possible categories in the data is referred to as levels of the factor.

``size <- factor(c("small", "large", "small", "medium", "medium")) ; size``
``````##  small  large  small  medium medium
## Levels: large medium small``````
``is.factor(size)      # returns TRUE is 'size' is a factor``
``##  TRUE``
``is.factor(colors)    # returns TRUE is 'colors' is a factor``
``##  FALSE``
``````colors <- as.factor(colors)    # attempts to coerce 'colors' vector into a factor
is.factor(colors)``````
``##  TRUE``

### Matrices

A matrix is a two-dimensional generalization of a vector. The values are arranged in rows and columns, and the elements must have the same data type.

``m <- matrix(1:8, nrow = 4, ncol = 2) ; m  # creates a matrix with 4 rows and 2 columns``
``````##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8``````
``is.matrix(m)    # returns TRUE if 'm' is a matrix``
``##  TRUE``

You can also combine vectors by rows or columns to create a matrix .

``````a <- 1:5
b <- 11:15
ab <- cbind(a, b)  ; ab   # combine 'a' and 'b' by columns``````
``````##      a  b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15``````
``is.matrix(ab)``
``##  TRUE``
``ab2 <- rbind(a, b)  ; ab2   # combine 'a' and 'b' by rows``
``````##   [,1] [,2] [,3] [,4] [,5]
## a    1    2    3    4    5
## b   11   12   13   14   15``````
``is.matrix(ab2)``
``##  TRUE``
``size <- as.matrix(size)    # attempts to coerce 'size' into a matrix``

To select specific elements in the matrix, you can do:

``ab[3, 2]    # selects the element in row 3 and column 2 of 'ab'``
``````##  b
## 13``````
``ab[1:3, ]   # selects rows 1-3 of 'ab'``
``````##      a  b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13``````
``ab[, 1:2]   # selects columns 1-2 of 'ab'``
``````##      a  b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15``````

### Dataframes

A dataframe is a collection of vectors with the same length (but they can be of different data types). We usually use a dataframe to represent an entire dataset.

``````dataset <- data.frame(shape=c("circle", "triangle", "rectangle", "circle", "circle"),
size = size, color = colors,
score = c(5, 4, 2, 9, 8))
dataset``````
``````##       shape   size  color score
## 1    circle  small    red     5
## 2  triangle  large yellow     4
## 3 rectangle  small   blue     2
## 4    circle medium    red     9
## 5    circle medium   blue     8``````
``is.data.frame(dataset)``
``##  TRUE``
``is.data.frame(ab)``
``##  FALSE``
``````ab <- as.data.frame(ab)  # attempts to coerce 'size_col' matrix into a dataframe
is.data.frame(ab)``````
``##  TRUE``

To select a specific variable (vector) or a subset of the dataframe, do:

``dataset[4, 3]      # selects the element in row 4 and column 3``
``````##  red
## Levels: blue red yellow``````
``dataset[]    # selects the third variable in 'dataset'``
``````##  red    yellow blue   red    blue
## Levels: blue red yellow``````
``dataset[["colors"]]   # selects 'colors' variable in 'dataset'``
``## NULL``
``dataset\$shape   # selects 'shape' variable in 'dataset'``
``##  "circle"    "triangle"  "rectangle" "circle"    "circle"``
``dataset\$shape   # selects the third element of 'shape' variable in 'dataset'``
``##  "rectangle"``
``subset(dataset, size=="medium")  # selects a subset of data that satisfies 'size=medium'``
``````##    shape   size color score
## 4 circle medium   red     9
## 5 circle medium  blue     8``````

### Lists

A list is a collection of data objects. The components can have different data types and lengths.

``````all_combined <- list(names = c("Bob", "Anne"),
age = c(26, 43),
numbers = numbers, samples = dataset)
all_combined``````
``````## \$names
##  "Bob"  "Anne"
##
## \$age
##  26 43
##
## \$numbers
##  1 2 3 4 5 6 7 8
##
## \$samples
##       shape   size  color score
## 1    circle  small    red     5
## 2  triangle  large yellow     4
## 3 rectangle  small   blue     2
## 4    circle medium    red     9
## 5    circle medium   blue     8``````
``is.list(all_combined)``
``##  TRUE``

You can extract specific components/elements of the list in various ways.

``all_combined\$names   # returns 'names' component``
``##  "Bob"  "Anne"``
``all_combined[['names']]``
``##  "Bob"  "Anne"``
``all_combined[]    # returns the first component``
``##  "Bob"  "Anne"``
``all_combined\$names   # returns the first element of 'names'``
``##  "Bob"``
``all_combined[['names']]  # returns the second element of 'names'``
``##  "Anne"``
``all_combined[]     # returns the first element of the second component``
``##  26``