Data structure

Vectors

A vector is a collection of values that all have the same data type. It can be a numeric or character vector depending on the data type of the elements.

## [1] 1 2 3 4 5 6 7 8
## [1] "red"    "yellow" "blue"   "red"    "blue"
## [1] TRUE
## [1] TRUE

You can also select a specific elements in the vector. Below are some examples.

## [1] 3
## [1] "red" "red"

Factors

A factor can be viewed as a special case of a vector. We usually use factors to represent categorical data (which has a fixed set of possible values). A set of possible categories in the data is referred to as levels of the factor.

## [1] small  large  small  medium medium
## Levels: large medium small
## [1] TRUE
## [1] FALSE
## [1] TRUE

Matrices

A matrix is a two-dimensional generalization of a vector. The values are arranged in rows and columns, and the elements must have the same data type.

##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
## [1] TRUE

You can also combine vectors by rows or columns to create a matrix .

##      a  b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15
## [1] TRUE
##   [,1] [,2] [,3] [,4] [,5]
## a    1    2    3    4    5
## b   11   12   13   14   15
## [1] TRUE

To select specific elements in the matrix, you can do:

##  b 
## 13
##      a  b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
##      a  b
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15

Dataframes

A dataframe is a collection of vectors with the same length (but they can be of different data types). We usually use a dataframe to represent an entire dataset.

##       shape   size  color score
## 1    circle  small    red     5
## 2  triangle  large yellow     4
## 3 rectangle  small   blue     2
## 4    circle medium    red     9
## 5    circle medium   blue     8
## [1] TRUE
## [1] FALSE
## [1] TRUE

To select a specific variable (vector) or a subset of the dataframe, do:

## [1] red
## Levels: blue red yellow
## [1] red    yellow blue   red    blue  
## Levels: blue red yellow
## NULL
## [1] "circle"    "triangle"  "rectangle" "circle"    "circle"
## [1] "rectangle"
##    shape   size color score
## 4 circle medium   red     9
## 5 circle medium  blue     8

Lists

A list is a collection of data objects. The components can have different data types and lengths.

## $names
## [1] "Bob"  "Anne"
## 
## $age
## [1] 26 43
## 
## $numbers
## [1] 1 2 3 4 5 6 7 8
## 
## $samples
##       shape   size  color score
## 1    circle  small    red     5
## 2  triangle  large yellow     4
## 3 rectangle  small   blue     2
## 4    circle medium    red     9
## 5    circle medium   blue     8
## [1] TRUE

You can extract specific components/elements of the list in various ways.

## [1] "Bob"  "Anne"
## [1] "Bob"  "Anne"
## [1] "Bob"  "Anne"
## [1] "Bob"
## [1] "Anne"
## [1] 26