Chapter 2 Chapter 2: Getting Data into R

install.packages("dplyr",repos = "https://cran.us.r-project.org")
install.packages("tidyr",repos = "https://cran.us.r-project.org")
install.packages("stringr",repos = "https://cran.us.r-project.org")
install.packages("lubridate",repos = "https://cran.us.r-project.org")
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)

Read in the data

compensation <- read.csv("/Users/peteapicella/Documents/R_tutorials/GSwR/compensation.csv")
head(compensation)
##    Root Fruit  Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
knitr::kable(head(compensation))
Root Fruit Grazing
6.225 59.77 Ungrazed
6.487 60.98 Ungrazed
4.919 14.73 Ungrazed
5.130 19.28 Ungrazed
5.417 34.25 Ungrazed
5.359 35.53 Ungrazed

2.1 Checking that your data are your data

Generate names of the columns/variables in the console:

names(compensation)
## [1] "Root"    "Fruit"   "Grazing"

Produce number of observations (rows in each column) followed by # of variables:

dim(compensation) 
## [1] 40  3

Review structure of the data:

str(compensation)
## 'data.frame':    40 obs. of  3 variables:
##  $ Root   : num  6.22 6.49 4.92 5.13 5.42 ...
##  $ Fruit  : num  59.8 61 14.7 19.3 34.2 ...
##  $ Grazing: chr  "Ungrazed" "Ungrazed" "Ungrazed" "Ungrazed" ...

2.2 Appendix advanced activity: dealing with untidy data

nasty.format <- read.csv("/Users/peteapicella/Documents/R_tutorials/GSwR/nasty format.csv")
head(nasty.format)
##      Species Bottle Temp X1.2.13 X2.2.13 X3.2.13 X4.2.13 X6.2.13 X8.2.13
## 1 P.caudatum  7-P.c   22   100.0    58.8    67.5     6.8    0.93    0.39
## 2 P.caudatum  8-P.c   22    62.5    71.3    67.5     7.9    0.90    0.36
## 3 P.caudatum  9-P.c   22    75.0    72.5    62.3     7.9    0.88    0.25
## 4 P.caudatum 22-P.c   20    75.0    73.8    76.3    31.3    3.12    1.01
## 5 P.caudatum 23-P.c   20    50.0      NA    81.3    32.5    3.75    1.06
## 6 P.caudatum 24-P.c   20    87.5      NA    62.5    28.8    3.12    1.00
##   X10.2.13 X12.2.13
## 1     0.19     0.46
## 2     0.16     0.34
## 3     0.23     0.31
## 4     0.56     0.50
## 5     0.49     0.38
## 6     0.41     0.46

Review data structure:

str(nasty.format)
## 'data.frame':    37 obs. of  11 variables:
##  $ Species : chr  "P.caudatum" "P.caudatum" "P.caudatum" "P.caudatum" ...
##  $ Bottle  : chr  "7-P.c" "8-P.c" "9-P.c" "22-P.c" ...
##  $ Temp    : int  22 22 22 20 20 20 15 15 15 22 ...
##  $ X1.2.13 : num  100 62.5 75 75 50 87.5 75 50 75 37.5 ...
##  $ X2.2.13 : num  58.8 71.3 72.5 73.8 NA NA NA NA NA 52.5 ...
##  $ X3.2.13 : num  67.5 67.5 62.3 76.3 81.3 62.5 90 78.8 78.3 23.8 ...
##  $ X4.2.13 : num  6.8 7.9 7.9 31.3 32.5 28.8 72.5 92.5 77.5 1.25 ...
##  $ X6.2.13 : num  0.93 0.9 0.88 3.12 3.75 ...
##  $ X8.2.13 : num  0.39 0.36 0.25 1.01 1.06 1 67.5 72.5 60 0.96 ...
##  $ X10.2.13: num  0.19 0.16 0.23 0.56 0.49 0.41 37.5 52.5 60 0.33 ...
##  $ X12.2.13: num  0.46 0.34 0.31 0.5 0.38 ...
  • this dataset is poorly constructed

Eliminate extra (37th) row in dataset:

nasty.format<-filter(nasty.format, Bottle !="") # '!=' symbol means '≠' 
tail(nasty.format)
##         Species Bottle Temp X1.2.13 X2.2.13 X3.2.13 X4.2.13 X6.2.13 X8.2.13
## 31 S. fonticola     19   20    25.0    87.5    85.0    98.8   78.75   71.25
## 32 S. fonticola     20   20    87.5    63.8    81.3    76.3   72.50   85.00
## 33 S. fonticola     21   20    50.0    77.5    83.8    97.5   68.75   71.25
## 34 S. fonticola     34   15    50.0      NA   101.3    93.8   70.00   91.25
## 35 S. fonticola     35   15    62.5      NA    65.0    72.5   61.25   72.50
## 36 S. fonticola     36   15   112.5      NA    76.3    67.5   61.25   77.50
##    X10.2.13 X12.2.13
## 31     68.8   101.25
## 32     72.5    85.00
## 33     60.0    98.75
## 34     76.3    80.00
## 35     66.3   102.50
## 36     91.3    77.50
  • this filter function is programmed to capture every row in which variable, ‘Bottle,’ contains text

Create new variables and assort data into them:

tidy_data <- gather(nasty.format, 
                    Date, Abundance, #the variables to be created 
                    4:11) #column headers that are dates in the nasty.format dataframe 
head(tidy_data)
##      Species Bottle Temp    Date Abundance
## 1 P.caudatum  7-P.c   22 X1.2.13     100.0
## 2 P.caudatum  8-P.c   22 X1.2.13      62.5
## 3 P.caudatum  9-P.c   22 X1.2.13      75.0
## 4 P.caudatum 22-P.c   20 X1.2.13      75.0
## 5 P.caudatum 23-P.c   20 X1.2.13      50.0
## 6 P.caudatum 24-P.c   20 X1.2.13      87.5

Remove the ‘X,’ which precedes that date in each observation:

tidy_data <- mutate(tidy_data, Date=substr(Date,2,20))
head(tidy_data)
##      Species Bottle Temp   Date Abundance
## 1 P.caudatum  7-P.c   22 1.2.13     100.0
## 2 P.caudatum  8-P.c   22 1.2.13      62.5
## 3 P.caudatum  9-P.c   22 1.2.13      75.0
## 4 P.caudatum 22-P.c   20 1.2.13      75.0
## 5 P.caudatum 23-P.c   20 1.2.13      50.0
## 6 P.caudatum 24-P.c   20 1.2.13      87.5

Display all unique dates:

unique(
  tidy_data$Date) #this says use the observations in the variable 'Date' in the 'tidy_data' dataframe
## [1] "1.2.13"  "2.2.13"  "3.2.13"  "4.2.13"  "6.2.13"  "8.2.13"  "10.2.13"
## [8] "12.2.13"

Reformat the dates to be universally recognized:

tidy_data <-mutate(tidy_data, Date=dmy(Date))
head(tidy_data)
##      Species Bottle Temp       Date Abundance
## 1 P.caudatum  7-P.c   22 2013-02-01     100.0
## 2 P.caudatum  8-P.c   22 2013-02-01      62.5
## 3 P.caudatum  9-P.c   22 2013-02-01      75.0
## 4 P.caudatum 22-P.c   20 2013-02-01      75.0
## 5 P.caudatum 23-P.c   20 2013-02-01      50.0
## 6 P.caudatum 24-P.c   20 2013-02-01      87.5

separate() Separates information present in one column to multiple new columns

unite() Puts information from several columns into one column

rbind() Puts datasets with exactly the same columns together

cbind() Combines two datasets with exactly the same columns together

full_join() Joins two datasets with one or more columns in common

merge() Same function as full_join() but from the base package