1.7 Object types
Objects in R can be of various types, or classes, and you can use the class function to view an object’s class and one of many ‘is.’ functions test an object’s class.
1.7.1 Numeric single value
A numeric object contains only numbers (e.g., 1, 2, 4.13, pi, etc.).
## [1] 5
## [1] "numeric"
## [1] TRUE
1.7.2 Character single value
A character string is a set of characters within one pair of quotes and may include spaces. For example, “elevated” and “elevated blood pressure” are each character objects with a single character value (a single string).
## [1] "elevated blood pressure"
## [1] "character"
## [1] TRUE
## [1] FALSE
## [1] FALSE
1.7.3 Vector
A vector contains an ordered collection of numbers or character strings indexed by the integers 1, 2, …, n, where n is the length of the vector.
## [1] 5 8 12
## [1] "numeric"
## [1] TRUE
## [1] 8
Vectors can contain either numeric or character values, but not both.
## [1] 5 8 12
## [1] "yes" "no"
If you attempt to create a vector containing both numeric and character values, it will convert the numeric values to character.
## [1] "5" "no"
Other ways of creating vectors are the following.
## [1] 1 2 3 4 5 6 7
## [1] 2 4 6 8 10
## [1] 3 3 3 3 3
## [1] 1 2 1 2 1 2
1.7.4 Factor
A factor is a special kind of vector which contains underlying numeric values 1, 2, …, n, but each of these n values has an associated character label (which may or may not be the numeric value). These labeled values are the levels of the factor. One common use of a factor is to store a categorical variable for use in a data analysis. Once you have created a factor vector with specific levels, no element of that vector can take on a value that is not one of its pre-assigned levels.
You can create a factor from a character vector, and R will assume that the unique values are the labels for the levels.
y <- factor(c("underweight", "underweight", "normal", "overweight", "normal"))
levels(y) # The levels function shows the possible values of a factor
## [1] "normal" "overweight" "underweight"
## [1] underweight underweight normal overweight normal
## Levels: normal overweight underweight
## [1] "factor"
If you want to change the labels, you can do so by assigning a new value to its levels. For example, suppose we want the labels to be capitalized.
## [1] "Normal" "Overweight" "Underweight"
Alternatively, you can assign new labels to the levels when you create the factor. This has the added advantage of allowing you to decide what order the levels should appear in. When we created the factor, R automatically assigned the levels by taking the unique values of y
and putting them in alphabetical order. For various reasons (like making a barchart later) you might want the levels to be in a different order. You can specify the order of the levels when you create the variable, but be careful because if you leave out a value that appears in the data that value will end up set to missing (NA
).
In the previous example, we would like the order to be from lower to higher weight.
# Enter ORIGINAL values in levels
# Enter the NEW level labels in labels
# Make sure the orderings of levels and labels correspond
y <- factor(c("underweight", "underweight", "normal", "overweight", "normal"),
levels = c("underweight", "normal", "overweight"),
labels = c("Underweight", "Normal", "Overweight"))
levels(y)
## [1] "Underweight" "Normal" "Overweight"
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
## [1] "factor"
1.7.5 Matrix
A matrix contains a two-dimensional collection of numeric or character values indexed by pairs of integers (i, j).
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [1] "matrix" "array"
## [,1] [,2] [,3]
## [1,] "a" "c" "e"
## [2,] "b" "d" "f"
## [1] "matrix" "array"
Notice that the matrix object has 2 classes. It is a matrix, but in some cases will be treated as an array object (see next subsection).
You can combine the same or different matrices by columns (cbind
) or rows (rbind
), as long as they have the same number of rows or columns, respectively.
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [,1]
## [1,] 6
## [2,] 5
## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 6
## [2,] 2 4 6 5
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [,1] [,2] [,3]
## [1,] 6 5 4
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 6 5 4
1.7.6 Array
An array contains an n-dimensional collection of numeric or character values indexed by n-tuples of integers (e.g., (i,j,k) for a 3-dimensional array).
## , , 1
##
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## , , 2
##
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
## [1] "array"
1.7.7 List
A list contains an ordered collection of objects, and the objects in the list can be of different types.
## [[1]]
## [1] "5"
##
## [[2]]
## [1] 1 2 3
##
## [[3]]
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
## [1] "list"
1.7.8 Data frame
A data.frame is a type of list commonly used to store datasets, with a row for each observation and a column for each variable, and each variable can be of a different type (e.g., numeric, character, etc.). When you create a data.frame
, you can optionally name the elements.
x <- data.frame(outcome = c(1,0,1,1),
exposure = c("yes", "yes", "no", "no"),
age = c(24, 55, 39, 18))
x
## outcome exposure age
## 1 1 yes 24
## 2 0 yes 55
## 3 1 no 39
## 4 1 no 18
## [1] "data.frame"