8.2 Creating matrices and dataframes

There are a number of ways to create your own matrix and dataframe objects in R. The most common functions are presented in Table 8.1. Because matrices and dataframes are just combinations of vectors, each function takes one or more vectors as inputs, and returns a matrix or a dataframe.

Table 8.1: Functions to create matrices and dataframes.
Function Description Example
cbind(a, b, c) Combine vectors as columns in a matrix cbind(1:5, 6:10, 11:15)
rbind(a, b, c) Combine vectors as rows in a matrix rbind(1:5, 6:10, 11:15)
matrix(x, nrow, ncol, byrow) Create a matrix from a vector x matrix(x = 1:12, nrow = 3, ncol = 4)
data.frame() Create a dataframe from named columns data.frame("age" = c(19, 21),
sex = c("m", "f"))

8.2.1 cbind(), rbind()

cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.

Let’s use these functions to create a matrix with the numbers 1 through 30. First, we’ll create three vectors of length 5, then we’ll combine them into one matrix. As you will see, the cbind() function will combine the vectors as columns in the final matrix, while the rbind() function will combine them as rows.

x <- 1:5
y <- 6:10
z <- 11:15

# Create a matrix where x, y and z are columns
cbind(x, y, z)
##      x  y  z
## [1,] 1  6 11
## [2,] 2  7 12
## [3,] 3  8 13
## [4,] 4  9 14
## [5,] 5 10 15

# Create a matrix where x, y and z are rows
rbind(x, y, z)
##   [,1] [,2] [,3] [,4] [,5]
## x    1    2    3    4    5
## y    6    7    8    9   10
## z   11   12   13   14   15

8.2.2 matrix()

Remember: Matrices can either contain numbers or character vectors, not both!. If you try to create a matrix with both numbers and characters, it will turn all the numbers into characters:

# Creating a matrix with numeric and character columns will make everything a character:

cbind(c(1, 2, 3, 4, 5),
      c("a", "b", "c", "d", "e"))
##      [,1] [,2]
## [1,] "1"  "a" 
## [2,] "2"  "b" 
## [3,] "3"  "c" 
## [4,] "4"  "d" 
## [5,] "5"  "e"

The matrix() function creates a matrix form a single vector of data. The function has 4 main inputs: data – a vector of data, nrow – the number of rows you want in the matrix, and ncol – the number of columns you want in the matrix, and byrow – a logical value indicating whether you want to fill the matrix by rows. Check out the help menu for the matrix function (`?matrix) to see some additional inputs.

Let’s use the matrix() function to re-create a matrix containing the values from 1 to 10.

# Create a matrix of the integers 1:10,
#  with 5 rows and 2 columns

matrix(data = 1:10,
       nrow = 5,
       ncol = 2)
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

# Now with 2 rows and 5 columns
matrix(data = 1:10,
       nrow = 2,
       ncol = 5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

# Now with 2 rows and 5 columns, but fill by row instead of columns
matrix(data = 1:10,
       nrow = 2,
       ncol = 5,
       byrow = TRUE)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10

8.2.3 data.frame()

To create a dataframe from vectors, use the data.frame() function. The data.frame() function works very similarly to cbind() – the only difference is that in data.frame() you specify names to each of the columns as you define them. Again, unlike matrices, dataframes can contain both string vectors and numeric vectors within the same object. Because they are more flexible than matrices, most large datasets in R will be stored as dataframes.

Let’s create a simple dataframe called survey using the data.frame() function with a mixture of text and numeric columns:

# Create a dataframe of survey data

survey <- data.frame("index" = c(1, 2, 3, 4, 5),
                     "sex" = c("m", "m", "m", "f", "f"),
                     "age" = c(99, 46, 23, 54, 23))
survey
##   index sex age
## 1     1   m  99
## 2     2   m  46
## 3     3   m  23
## 4     4   f  54
## 5     5   f  23

8.2.3.1 stringsAsFactors = FALSE

There is one key argument to data.frame() and similar functions called stringsAsFactors. By default, the data.frame() function will automatically convert any string columns to a specific type of object called a factor in R. A factor is a nominal variable that has a well-specified possible set of values that it can take on. For example, one can create a factor sex that can only take on the values "male" and "female".

However, as I’m sure you’ll discover, having R automatically convert your string data to factors can lead to lots of strange results. For example: if you have a factor of sex data, but then you want to add a new value called other, R will yell at you and return an error. I hate, hate, HATE when this happens. While there are very, very rare cases when I find factors useful, I almost always don’t want or need them. For this reason, I avoid them at all costs.

To tell R to not convert your string columns to factors, you need to include the argument stringsAsFactors = FALSE when using functions such as data.frame()

For example, let’s look at the classes of the columns in the dataframe survey that we just created using the str() function (we’ll go over this function in section XXX)

# Show me the structure of the survey dataframe
str(survey)
## 'data.frame':    5 obs. of  3 variables:
##  $ index: num  1 2 3 4 5
##  $ sex  : Factor w/ 2 levels "f","m": 2 2 2 1 1
##  $ age  : num  99 46 23 54 23

AAAAA!!! R has converted the column sex to a factor with only two possible levels! This can cause major problems later! Let’s create the dataframe again using the argument stringsAsFactors = FALSE to make sure that this doesn’t happen:

# Create a dataframe of survey data WITHOUT factors
survey <- data.frame("index" = c(1, 2, 3, 4, 5),
                     "sex" = c("m", "m", "m", "f", "f"),
                     "age" = c(99, 46, 23, 54, 23),
                     stringsAsFactors = FALSE)

Now let’s look at the new version and make sure there are no factors:

# Print the result (it looks the same as before)
survey
##   index sex age
## 1     1   m  99
## 2     2   m  46
## 3     3   m  23
## 4     4   f  54
## 5     5   f  23

# Look at the structure: no more factors!
str(survey)
## 'data.frame':    5 obs. of  3 variables:
##  $ index: num  1 2 3 4 5
##  $ sex  : chr  "m" "m" "m" "f" ...
##  $ age  : num  99 46 23 54 23

8.2.4 Dataframes pre-loaded in R

Now you know how to use functions like cbind() and data.frame() to manually create your own matrices and dataframes in R. However, for demonstration purposes, it’s frequently easier to use existing dataframes rather than always having to create your own. Thankfully, R has us covered: R has several datasets that come pre-installed in a package called datasets – you don’t need to install this package, it’s included in the base R software. While you probably won’t make any major scientific discoveries with these datasets, they allow all R users to test and compare code on the same sets of data. To see a complete list of all the datasets included in the datasets package, run the code: library(help = "datasets"). Table 8.2 shows a few datasets that we will be using in future examples:

Table 8.2: A few datasets you can access in R.
Dataset Description Rows Columns
ChickWeight Experiment on the effect of diet on early growth of chicks. 578 4
InsectSprays The counts of insects in agricultural experimental units treated with different insecticides. 72 2
ToothGrowth Length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. 60 3
PlantGrowth Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. 30 2