Chapter 3 Chapter 3: Data Management and Manipulation
install.packages("dplyr",repos = "https://cran.us.r-project.org")
install.packages("tidyr",repos = "https://cran.us.r-project.org")
install.packages("stringr",repos = "https://cran.us.r-project.org")
install.packages("lubridate",repos = "https://cran.us.r-project.org")
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
Read in the data
<- read.csv("/Users/peteapicella/Documents/R_tutorials/GSwR/compensation.csv")
compensation head(compensation)
## Root Fruit Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
Summarize the data in each variable:
summary(compensation)
## Root Fruit Grazing
## Min. : 4.426 Min. : 14.73 Length:40
## 1st Qu.: 6.083 1st Qu.: 41.15 Class :character
## Median : 7.123 Median : 60.88 Mode :character
## Mean : 7.181 Mean : 59.41
## 3rd Qu.: 8.510 3rd Qu.: 76.19
## Max. :10.253 Max. :116.05
3.1 Subsetting
Create new dataframe comprised of specific variable(s):
head(select(compensation,
Fruit))
## Fruit
## 1 59.77
## 2 60.98
## 3 14.73
## 4 19.28
## 5 34.25
## 6 35.53
Select all columns except one:
head(select(compensation, -Root))
## Fruit Grazing
## 1 59.77 Ungrazed
## 2 60.98 Ungrazed
## 3 14.73 Ungrazed
## 4 19.28 Ungrazed
## 5 34.25 Ungrazed
## 6 35.53 Ungrazed
Create new dataframe comrpised of specific variable(s) except ‘Root’:
head(slice(compensation, 2:10))
## Root Fruit Grazing
## 1 6.487 60.98 Ungrazed
## 2 4.919 14.73 Ungrazed
## 3 5.130 19.28 Ungrazed
## 4 5.417 34.25 Ungrazed
## 5 5.359 35.53 Ungrazed
## 6 7.614 87.73 Ungrazed
Create new dataframe comprised of a list of variables:
head(slice(compensation, c(2,3,10)))
## Root Fruit Grazing
## 1 6.487 60.98 Ungrazed
## 2 4.919 14.73 Ungrazed
## 3 6.930 64.34 Ungrazed
Filter data set to only observations in which this is TRUE:
filter(compensation, Fruit == 80)
## [1] Root Fruit Grazing
## <0 rows> (or 0-length row.names)
Grab observations when Fruit is not equal to 80:
head(filter(compensation, Fruit !=80))
## Root Fruit Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
Grab any observations in which Fruit is ≤ 80; can also use < symbol for less than:
head(filter(compensation, Fruit <=80))
## Root Fruit Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
Grab any observations in which Fruit is greater than 95 OR less than 15:
head(filter(compensation, Fruit >95|Fruit<15))
## Root Fruit Grazing
## 1 4.919 14.73 Ungrazed
## 2 10.253 116.05 Grazed
## 3 6.106 14.95 Grazed
## 4 9.844 105.07 Grazed
## 5 9.351 98.47 Grazed
Grab any observations in which Fruit is greater than 50 AND less than 55:
head(filter(compensation, Fruit >50 & Fruit<55))
## Root Fruit Grazing
## 1 6.248 52.92 Ungrazed
## 2 6.013 53.61 Ungrazed
## 3 5.928 54.86 Ungrazed
## 4 7.354 50.08 Grazed
## 5 8.158 52.26 Grazed
Order data by Fruit from lowest to highest observation:
head(arrange(compensation, Fruit))
## Root Fruit Grazing
## 1 4.919 14.73 Ungrazed
## 2 6.106 14.95 Grazed
## 3 4.426 18.89 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 4.975 24.25 Ungrazed
## 6 5.451 32.35 Ungrazed
Create new dataframe that filters observations that have Fruit values above 80 and only contains the corresponding Root values:
head(select(filter(compensation, Fruit>80), Root))
## Root
## 1 7.614
## 2 7.001
## 3 10.253
## 4 9.039
## 5 8.988
## 6 8.975
3.2 Calculating summary statistics about groups of your data
Perform summary analyses on dataframe:
summarise(
group_by(compensation, Grazing), #access the dataframe, target Grazing to be the grouping variable
meanFruit = mean(Fruit)) #creates the object, meanFruit which is the mean of the data in the Fruit variable
## # A tibble: 2 × 2
## Grazing meanFruit
## <chr> <dbl>
## 1 Grazed 67.9
## 2 Ungrazed 50.9
Additional summary functions and create new dataframe to encompass calculations:
<-summarise(
mean.fruitgroup_by(compensation, Grazing),
meanFruit = mean(Fruit), sdfruit =sd(Fruit)) #multiple statistics can be calculated within summarise
mean.fruit
## # A tibble: 2 × 3
## Grazing meanFruit sdfruit
## <chr> <dbl> <dbl>
## 1 Grazed 67.9 25.0
## 2 Ungrazed 50.9 21.8
<- sum(with(compensation, Grazing == "Grazed")) #counts number of observations for variable when it = Grazed
x x
## [1] 20
<-summarise(
SE.mean.fruitgroup_by(compensation, Grazing),
meanFruit = mean(Fruit),
SEfruit =(sd(Fruit))/sqrt(x)) #multiple statistics can be calculated within summarise
SE.mean.fruit
## # A tibble: 2 × 3
## Grazing meanFruit SEfruit
## <chr> <dbl> <dbl>
## 1 Grazed 67.9 5.58
## 2 Ungrazed 50.9 4.87