9 dplyr practice
9.1 starwars data
A starwars
is a tibble in dplyr
containing 13 variables about the features of 13 characters in the movie.
## # A tibble: 87 x 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke~ 172 77 blond fair blue 19 male mascu~
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu~
## 3 R2-D2 96 32 <NA> white, bl~ red 33 none mascu~
## 4 Dart~ 202 136 none white yellow 41.9 male mascu~
## 5 Leia~ 150 49 brown light brown 19 fema~ femin~
## 6 Owen~ 178 120 brown, gr~ light blue 52 male mascu~
## 7 Beru~ 165 75 brown light blue 47 fema~ femin~
## 8 R5-D4 97 32 <NA> white, red red NA none mascu~
## 9 Bigg~ 183 84 black light brown 24 male mascu~
## 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male mascu~
## # ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
## Rows: 87
## Columns: 14
## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
## $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
## $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
## $ films <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
## $ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
## $ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...
Exercise 1
How many humans are contained in starwars
overall? (Hint. use count()
)
Exercise 2
How many humans are contained in starwars
by gender?
Exercise 3
From which homeworld do the most indidividuals (rows) come from?
Exercise 4
What is the mean height of all individuals with orange eyes from the most popular homeworld?
Exercise 5
Compute the median, mean, and standard deviation of height for all droids.
9.2 data sets in the ds4psy package
## Warning: package 'ds4psy' was built under R version 4.0.3
## Welcome to ds4psy (v0.5.0)!
##
## Attaching package: 'ds4psy'
## The following object is masked from 'package:gcookbook':
##
## countries
## # A tibble: 295 x 294
## id intervention sex age educ income occasion.0 elapsed.days.0 ahi01.0
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4 2 35 5 3 0 0 2
## 2 2 1 1 59 1 1 0 0 3
## 3 3 4 1 51 4 3 0 0 3
## 4 4 3 1 50 5 2 0 0 2
## 5 5 2 2 58 5 2 0 0 1
## 6 6 1 1 31 5 1 0 0 2
## 7 7 3 1 44 5 2 0 0 3
## 8 8 2 1 57 4 2 0 0 3
## 9 9 1 1 36 4 3 0 0 2
## 10 10 2 1 45 4 3 0 0 2
## # ... with 285 more rows, and 285 more variables: ahi02.0 <dbl>, ahi03.0 <dbl>,
## # ahi04.0 <dbl>, ahi05.0 <dbl>, ahi06.0 <dbl>, ahi07.0 <dbl>, ahi08.0 <dbl>,
## # ahi09.0 <dbl>, ahi10.0 <dbl>, ahi11.0 <dbl>, ahi12.0 <dbl>, ahi13.0 <dbl>,
## # ahi14.0 <dbl>, ahi15.0 <dbl>, ahi16.0 <dbl>, ahi17.0 <dbl>, ahi18.0 <dbl>,
## # ahi19.0 <dbl>, ahi20.0 <dbl>, ahi21.0 <dbl>, ahi22.0 <dbl>, ahi23.0 <dbl>,
## # ahi24.0 <dbl>, cesd01.0 <dbl>, cesd02.0 <dbl>, cesd03.0 <dbl>,
## # cesd04.0 <dbl>, cesd05.0 <dbl>, cesd06.0 <dbl>, cesd07.0 <dbl>,
## # cesd08.0 <dbl>, cesd09.0 <dbl>, cesd10.0 <dbl>, cesd11.0 <dbl>,
## # cesd12.0 <dbl>, cesd13.0 <dbl>, cesd14.0 <dbl>, cesd15.0 <dbl>,
## # cesd16.0 <dbl>, cesd17.0 <dbl>, cesd18.0 <dbl>, cesd19.0 <dbl>,
## # cesd20.0 <dbl>, ahiTotal.0 <dbl>, cesdTotal.0 <dbl>, occasion.1 <dbl>,
## # elapsed.days.1 <dbl>, ahi01.1 <dbl>, ahi02.1 <dbl>, ahi03.1 <dbl>,
## # ahi04.1 <dbl>, ahi05.1 <dbl>, ahi06.1 <dbl>, ahi07.1 <dbl>, ahi08.1 <dbl>,
## # ahi09.1 <dbl>, ahi10.1 <dbl>, ahi11.1 <dbl>, ahi12.1 <dbl>, ahi13.1 <dbl>,
## # ahi14.1 <dbl>, ahi15.1 <dbl>, ahi16.1 <dbl>, ahi17.1 <dbl>, ahi18.1 <dbl>,
## # ahi19.1 <dbl>, ahi20.1 <dbl>, ahi21.1 <dbl>, ahi22.1 <dbl>, ahi23.1 <dbl>,
## # ahi24.1 <dbl>, cesd01.1 <dbl>, cesd02.1 <dbl>, cesd03.1 <dbl>,
## # cesd04.1 <dbl>, cesd05.1 <dbl>, cesd06.1 <dbl>, cesd07.1 <dbl>,
## # cesd08.1 <dbl>, cesd09.1 <dbl>, cesd10.1 <dbl>, cesd11.1 <dbl>,
## # cesd12.1 <dbl>, cesd13.1 <dbl>, cesd14.1 <dbl>, cesd15.1 <dbl>,
## # cesd16.1 <dbl>, cesd17.1 <dbl>, cesd18.1 <dbl>, cesd19.1 <dbl>,
## # cesd20.1 <dbl>, ahiTotal.1 <dbl>, cesdTotal.1 <dbl>, occasion.2 <dbl>,
## # elapsed.days.2 <dbl>, ahi01.2 <dbl>, ahi02.2 <dbl>, ahi03.2 <dbl>,
## # ahi04.2 <dbl>, ahi05.2 <dbl>, ...
Exercise 6
What are the data’s dimensions, variables, and types of variables?
Exercise 7
From posPsy_wide
, select id
, intervention
, sex
, age
, educ
, income
, and 6 ahi01
items across six waves, and assign the name posPsy_wide_subset
.
# `\\d{2}` is a regular expression that represents `any two-digits`
# a regular expression allow us to match any arbitrary character string.
# We will discuss the regular expression later.
posPsy_wide_subset <- posPsy_wide %>%
select(id:income, matches("ahi01.\\d"))
posPsy_wide_subset
## # A tibble: 295 x 12
## id intervention sex age educ income ahi01.0 ahi01.1 ahi01.2 ahi01.3
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4 2 35 5 3 2 3 NA NA
## 2 2 1 1 59 1 1 3 3 3 3
## 3 3 4 1 51 4 3 3 NA 3 NA
## 4 4 3 1 50 5 2 2 3 NA 1
## 5 5 2 2 58 5 2 1 1 2 2
## 6 6 1 1 31 5 1 2 NA 3 3
## 7 7 3 1 44 5 2 3 NA NA NA
## 8 8 2 1 57 4 2 3 2 2 3
## 9 9 1 1 36 4 3 2 NA NA NA
## 10 10 2 1 45 4 3 2 NA NA NA
## # ... with 285 more rows, and 2 more variables: ahi01.4 <dbl>, ahi01.5 <dbl>
Exercise 8
Using the pivot_longer()
in tidyr
, make posPsy_wide_subset
longer.
Exercise 9
Using separate()
in tidyr
, create a variable Wave
that indicates the measurement index (i.e., 0, 1, 2, 3, 4, 5).