## 8.2 Creating matrices and dataframes

There are a number of ways to create your own matrix and dataframe objects in R. The most common functions are presented in Table 8.1. Because matrices and dataframes are just combinations of vectors, each function takes one or more vectors as inputs, and returns a matrix or a dataframe.

Function | Description | Example |
---|---|---|

`cbind(a, b, c)` |
Combine vectors as columns in a matrix | `cbind(1:5, 6:10, 11:15)` |

`rbind(a, b, c)` |
Combine vectors as rows in a matrix | `rbind(1:5, 6:10, 11:15)` |

`matrix(x, nrow, ncol, byrow)` |
Create a matrix from a vector `x` |
`matrix(x = 1:12, nrow = 3, ncol = 4)` |

`data.frame()` |
Create a dataframe from named columns | `data.frame("age" = c(19, 21),` `sex = c("m", "f"))` |

### 8.2.1 `cbind()`

, `rbind()`

`cbind()`

and `rbind()`

both create matrices by combining several vectors of the same length. `cbind()`

combines vectors as columns, while `rbind()`

combines them as rows.

Let’s use these functions to create a matrix with the numbers 1 through 30. First, we’ll create three vectors of length 5, then we’ll combine them into one matrix. As you will see, the `cbind()`

function will combine the vectors as columns in the final matrix, while the `rbind()`

function will combine them as rows.

```
x <- 1:5
y <- 6:10
z <- 11:15
# Create a matrix where x, y and z are columns
cbind(x, y, z)
## x y z
## [1,] 1 6 11
## [2,] 2 7 12
## [3,] 3 8 13
## [4,] 4 9 14
## [5,] 5 10 15
# Create a matrix where x, y and z are rows
rbind(x, y, z)
## [,1] [,2] [,3] [,4] [,5]
## x 1 2 3 4 5
## y 6 7 8 9 10
## z 11 12 13 14 15
```

### 8.2.2 `matrix()`

**Remember**: Matrices can either contain numbers *or* character vectors, not both!. If you try to create a matrix with both numbers and characters, it will turn all the numbers into characters:

```
# Creating a matrix with numeric and character columns will make everything a character:
cbind(c(1, 2, 3, 4, 5),
c("a", "b", "c", "d", "e"))
## [,1] [,2]
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
## [4,] "4" "d"
## [5,] "5" "e"
```

The `matrix()`

function creates a matrix form a single vector of data. The function has 4 main inputs: `data`

– a vector of data, `nrow`

– the number of rows you want in the matrix, and `ncol`

– the number of columns you want in the matrix, and `byrow`

– a logical value indicating whether you want to fill the matrix by rows. Check out the help menu for the matrix function (`?matrix) to see some additional inputs.

Let’s use the `matrix()`

function to re-create a matrix containing the values from 1 to 10.

```
# Create a matrix of the integers 1:10,
# with 5 rows and 2 columns
matrix(data = 1:10,
nrow = 5,
ncol = 2)
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
# Now with 2 rows and 5 columns
matrix(data = 1:10,
nrow = 2,
ncol = 5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
# Now with 2 rows and 5 columns, but fill by row instead of columns
matrix(data = 1:10,
nrow = 2,
ncol = 5,
byrow = TRUE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
```

### 8.2.3 `data.frame()`

To create a dataframe from vectors, use the `data.frame()`

function. The `data.frame()`

function works very similarly to `cbind()`

– the only difference is that in `data.frame()`

you specify names to each of the columns as you define them. Again, unlike matrices, dataframes can contain *both* string vectors and numeric vectors within the same object. Because they are more flexible than matrices, most large datasets in R will be stored as dataframes.

Let’s create a simple dataframe called `survey`

using the `data.frame()`

function with a mixture of text and numeric columns:

```
# Create a dataframe of survey data
survey <- data.frame("index" = c(1, 2, 3, 4, 5),
"sex" = c("m", "m", "m", "f", "f"),
"age" = c(99, 46, 23, 54, 23))
survey
## index sex age
## 1 1 m 99
## 2 2 m 46
## 3 3 m 23
## 4 4 f 54
## 5 5 f 23
```

#### 8.2.3.1 `stringsAsFactors = FALSE`

There is one key argument to `data.frame()`

and similar functions called `stringsAsFactors`

. By default, the `data.frame()`

function will automatically convert any string columns to a specific type of object called a **factor** in R. A factor is a nominal variable that has a well-specified possible set of values that it can take on. For example, one can create a factor `sex`

that can *only* take on the values `"male"`

and `"female"`

.

However, as I’m sure you’ll discover, having R automatically convert your string data to factors can lead to lots of strange results. For example: if you have a factor of sex data, but then you want to add a new value called `other`

, R will yell at you and return an error. I *hate*, *hate*, *HATE* when this happens. While there are very, very rare cases when I find factors useful, I almost always don’t want or need them. For this reason, I avoid them at all costs.

To tell R to *not* convert your string columns to factors, you need to include the argument `stringsAsFactors = FALSE`

when using functions such as `data.frame()`

For example, let’s look at the classes of the columns in the dataframe `survey`

that we just created using the `str()`

function (we’ll go over this function in section XXX)

```
# Show me the structure of the survey dataframe
str(survey)
## 'data.frame': 5 obs. of 3 variables:
## $ index: num 1 2 3 4 5
## $ sex : Factor w/ 2 levels "f","m": 2 2 2 1 1
## $ age : num 99 46 23 54 23
```

AAAAA!!! R has converted the column `sex`

to a factor with *only* two possible levels! This can cause major problems later! Let’s create the dataframe again using the argument `stringsAsFactors = FALSE`

to make sure that this doesn’t happen:

```
# Create a dataframe of survey data WITHOUT factors
survey <- data.frame("index" = c(1, 2, 3, 4, 5),
"sex" = c("m", "m", "m", "f", "f"),
"age" = c(99, 46, 23, 54, 23),
stringsAsFactors = FALSE)
```

Now let’s look at the new version and make sure there are no factors:

```
# Print the result (it looks the same as before)
survey
## index sex age
## 1 1 m 99
## 2 2 m 46
## 3 3 m 23
## 4 4 f 54
## 5 5 f 23
# Look at the structure: no more factors!
str(survey)
## 'data.frame': 5 obs. of 3 variables:
## $ index: num 1 2 3 4 5
## $ sex : chr "m" "m" "m" "f" ...
## $ age : num 99 46 23 54 23
```

### 8.2.4 Dataframes pre-loaded in R

Now you know how to use functions like `cbind()`

and `data.frame()`

to manually create your own matrices and dataframes in R. However, for demonstration purposes, it’s frequently easier to use existing dataframes rather than always having to create your own. Thankfully, R has us covered: R has several datasets that come pre-installed in a package called `datasets`

– you don’t need to install this package, it’s included in the base R software. While you probably won’t make any major scientific discoveries with these datasets, they allow all R users to test and compare code on the same sets of data. To see a complete list of all the datasets included in the `datasets`

package, run the code: `library(help = "datasets")`

. Table 8.2 shows a few datasets that we will be using in future examples:

Dataset | Description | Rows | Columns |
---|---|---|---|

`ChickWeight` |
Experiment on the effect of diet on early growth of chicks. | 578 | 4 |

`InsectSprays` |
The counts of insects in agricultural experimental units treated with different insecticides. | 72 | 2 |

`ToothGrowth` |
Length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. | 60 | 3 |

`PlantGrowth` |
Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. | 30 | 2 |