Chapter 3 Chapter 3: Data Management and Manipulation

install.packages("dplyr",repos = "https://cran.us.r-project.org")
install.packages("tidyr",repos = "https://cran.us.r-project.org")
install.packages("stringr",repos = "https://cran.us.r-project.org")
install.packages("lubridate",repos = "https://cran.us.r-project.org")
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)

Read in the data

compensation <- read.csv("/Users/peteapicella/Documents/R_tutorials/GSwR/compensation.csv")
head(compensation)
##    Root Fruit  Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed

Summarize the data in each variable:

 summary(compensation)
##       Root            Fruit          Grazing         
##  Min.   : 4.426   Min.   : 14.73   Length:40         
##  1st Qu.: 6.083   1st Qu.: 41.15   Class :character  
##  Median : 7.123   Median : 60.88   Mode  :character  
##  Mean   : 7.181   Mean   : 59.41                     
##  3rd Qu.: 8.510   3rd Qu.: 76.19                     
##  Max.   :10.253   Max.   :116.05

3.1 Subsetting

Create new dataframe comprised of specific variable(s):

head(select(compensation,
           Fruit))
##   Fruit
## 1 59.77
## 2 60.98
## 3 14.73
## 4 19.28
## 5 34.25
## 6 35.53

Select all columns except one:

head(select(compensation, -Root))
##   Fruit  Grazing
## 1 59.77 Ungrazed
## 2 60.98 Ungrazed
## 3 14.73 Ungrazed
## 4 19.28 Ungrazed
## 5 34.25 Ungrazed
## 6 35.53 Ungrazed

Create new dataframe comrpised of specific variable(s) except ‘Root’:

head(slice(compensation, 2:10))
##    Root Fruit  Grazing
## 1 6.487 60.98 Ungrazed
## 2 4.919 14.73 Ungrazed
## 3 5.130 19.28 Ungrazed
## 4 5.417 34.25 Ungrazed
## 5 5.359 35.53 Ungrazed
## 6 7.614 87.73 Ungrazed

Create new dataframe comprised of a list of variables:

head(slice(compensation, c(2,3,10)))
##    Root Fruit  Grazing
## 1 6.487 60.98 Ungrazed
## 2 4.919 14.73 Ungrazed
## 3 6.930 64.34 Ungrazed

Filter data set to only observations in which this is TRUE:

filter(compensation, Fruit == 80) 
## [1] Root    Fruit   Grazing
## <0 rows> (or 0-length row.names)

Grab observations when Fruit is not equal to 80:

head(filter(compensation, Fruit !=80))
##    Root Fruit  Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed

Grab any observations in which Fruit is ≤ 80; can also use < symbol for less than:

head(filter(compensation, Fruit <=80))
##    Root Fruit  Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed

Grab any observations in which Fruit is greater than 95 OR less than 15:

head(filter(compensation, Fruit >95|Fruit<15))
##     Root  Fruit  Grazing
## 1  4.919  14.73 Ungrazed
## 2 10.253 116.05   Grazed
## 3  6.106  14.95   Grazed
## 4  9.844 105.07   Grazed
## 5  9.351  98.47   Grazed

Grab any observations in which Fruit is greater than 50 AND less than 55:

head(filter(compensation, Fruit >50 & Fruit<55))
##    Root Fruit  Grazing
## 1 6.248 52.92 Ungrazed
## 2 6.013 53.61 Ungrazed
## 3 5.928 54.86 Ungrazed
## 4 7.354 50.08   Grazed
## 5 8.158 52.26   Grazed

Order data by Fruit from lowest to highest observation:

head(arrange(compensation, Fruit))
##    Root Fruit  Grazing
## 1 4.919 14.73 Ungrazed
## 2 6.106 14.95   Grazed
## 3 4.426 18.89 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 4.975 24.25 Ungrazed
## 6 5.451 32.35 Ungrazed

Create new dataframe that filters observations that have Fruit values above 80 and only contains the corresponding Root values:

head(select(filter(compensation, Fruit>80), Root))
##     Root
## 1  7.614
## 2  7.001
## 3 10.253
## 4  9.039
## 5  8.988
## 6  8.975

3.2 Calculating summary statistics about groups of your data

Perform summary analyses on dataframe:

summarise(
  group_by(compensation, Grazing), #access the dataframe, target Grazing to be the grouping variable 
  meanFruit = mean(Fruit)) #creates the object, meanFruit which is the mean of the data in the Fruit variable
## # A tibble: 2 × 2
##   Grazing  meanFruit
##   <chr>        <dbl>
## 1 Grazed        67.9
## 2 Ungrazed      50.9

Additional summary functions and create new dataframe to encompass calculations:

mean.fruit<-summarise(
  group_by(compensation, Grazing), 
  meanFruit = mean(Fruit), sdfruit =sd(Fruit)) #multiple statistics can be calculated within summarise 
mean.fruit
## # A tibble: 2 × 3
##   Grazing  meanFruit sdfruit
##   <chr>        <dbl>   <dbl>
## 1 Grazed        67.9    25.0
## 2 Ungrazed      50.9    21.8
x <- sum(with(compensation, Grazing == "Grazed")) #counts number of observations for variable when it = Grazed 
x
## [1] 20
SE.mean.fruit<-summarise(
  group_by(compensation, Grazing), 
  meanFruit = mean(Fruit), 
  SEfruit =(sd(Fruit))/sqrt(x)) #multiple statistics can be calculated within summarise 
SE.mean.fruit
## # A tibble: 2 × 3
##   Grazing  meanFruit SEfruit
##   <chr>        <dbl>   <dbl>
## 1 Grazed        67.9    5.58
## 2 Ungrazed      50.9    4.87