9 dplyr practice

9.1 starwars data

A starwars is a tibble in dplyr containing 13 variables about the features of 13 characters in the movie.

starwars
## # A tibble: 87 x 14
##    name  height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke~    172    77 blond      fair       blue            19   male  mascu~
##  2 C-3PO    167    75 <NA>       gold       yellow         112   none  mascu~
##  3 R2-D2     96    32 <NA>       white, bl~ red             33   none  mascu~
##  4 Dart~    202   136 none       white      yellow          41.9 male  mascu~
##  5 Leia~    150    49 brown      light      brown           19   fema~ femin~
##  6 Owen~    178   120 brown, gr~ light      blue            52   male  mascu~
##  7 Beru~    165    75 brown      light      blue            47   fema~ femin~
##  8 R5-D4     97    32 <NA>       white, red red             NA   none  mascu~
##  9 Bigg~    183    84 black      light      brown           24   male  mascu~
## 10 Obi-~    182    77 auburn, w~ fair       blue-gray       57   male  mascu~
## # ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>
# similar to str() in Base R
glimpse(starwars)
## Rows: 87
## Columns: 14
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
## $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
## $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
## $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
## $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
## $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
## $ films      <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
## $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
## $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...

Exercise 1

How many humans are contained in starwars overall? (Hint. use count())

Exercise 2

How many humans are contained in starwars by gender?

Exercise 3

From which homeworld do the most indidividuals (rows) come from?

Exercise 4

What is the mean height of all individuals with orange eyes from the most popular homeworld?

Exercise 5

Compute the median, mean, and standard deviation of height for all droids.

9.2 data sets in the ds4psy package

library(ds4psy)
## Warning: package 'ds4psy' was built under R version 4.0.3
## Welcome to ds4psy (v0.5.0)!
## 
## Attaching package: 'ds4psy'
## The following object is masked from 'package:gcookbook':
## 
##     countries
posPsy_wide
## # A tibble: 295 x 294
##       id intervention   sex   age  educ income occasion.0 elapsed.days.0 ahi01.0
##    <dbl>        <dbl> <dbl> <dbl> <dbl>  <dbl>      <dbl>          <dbl>   <dbl>
##  1     1            4     2    35     5      3          0              0       2
##  2     2            1     1    59     1      1          0              0       3
##  3     3            4     1    51     4      3          0              0       3
##  4     4            3     1    50     5      2          0              0       2
##  5     5            2     2    58     5      2          0              0       1
##  6     6            1     1    31     5      1          0              0       2
##  7     7            3     1    44     5      2          0              0       3
##  8     8            2     1    57     4      2          0              0       3
##  9     9            1     1    36     4      3          0              0       2
## 10    10            2     1    45     4      3          0              0       2
## # ... with 285 more rows, and 285 more variables: ahi02.0 <dbl>, ahi03.0 <dbl>,
## #   ahi04.0 <dbl>, ahi05.0 <dbl>, ahi06.0 <dbl>, ahi07.0 <dbl>, ahi08.0 <dbl>,
## #   ahi09.0 <dbl>, ahi10.0 <dbl>, ahi11.0 <dbl>, ahi12.0 <dbl>, ahi13.0 <dbl>,
## #   ahi14.0 <dbl>, ahi15.0 <dbl>, ahi16.0 <dbl>, ahi17.0 <dbl>, ahi18.0 <dbl>,
## #   ahi19.0 <dbl>, ahi20.0 <dbl>, ahi21.0 <dbl>, ahi22.0 <dbl>, ahi23.0 <dbl>,
## #   ahi24.0 <dbl>, cesd01.0 <dbl>, cesd02.0 <dbl>, cesd03.0 <dbl>,
## #   cesd04.0 <dbl>, cesd05.0 <dbl>, cesd06.0 <dbl>, cesd07.0 <dbl>,
## #   cesd08.0 <dbl>, cesd09.0 <dbl>, cesd10.0 <dbl>, cesd11.0 <dbl>,
## #   cesd12.0 <dbl>, cesd13.0 <dbl>, cesd14.0 <dbl>, cesd15.0 <dbl>,
## #   cesd16.0 <dbl>, cesd17.0 <dbl>, cesd18.0 <dbl>, cesd19.0 <dbl>,
## #   cesd20.0 <dbl>, ahiTotal.0 <dbl>, cesdTotal.0 <dbl>, occasion.1 <dbl>,
## #   elapsed.days.1 <dbl>, ahi01.1 <dbl>, ahi02.1 <dbl>, ahi03.1 <dbl>,
## #   ahi04.1 <dbl>, ahi05.1 <dbl>, ahi06.1 <dbl>, ahi07.1 <dbl>, ahi08.1 <dbl>,
## #   ahi09.1 <dbl>, ahi10.1 <dbl>, ahi11.1 <dbl>, ahi12.1 <dbl>, ahi13.1 <dbl>,
## #   ahi14.1 <dbl>, ahi15.1 <dbl>, ahi16.1 <dbl>, ahi17.1 <dbl>, ahi18.1 <dbl>,
## #   ahi19.1 <dbl>, ahi20.1 <dbl>, ahi21.1 <dbl>, ahi22.1 <dbl>, ahi23.1 <dbl>,
## #   ahi24.1 <dbl>, cesd01.1 <dbl>, cesd02.1 <dbl>, cesd03.1 <dbl>,
## #   cesd04.1 <dbl>, cesd05.1 <dbl>, cesd06.1 <dbl>, cesd07.1 <dbl>,
## #   cesd08.1 <dbl>, cesd09.1 <dbl>, cesd10.1 <dbl>, cesd11.1 <dbl>,
## #   cesd12.1 <dbl>, cesd13.1 <dbl>, cesd14.1 <dbl>, cesd15.1 <dbl>,
## #   cesd16.1 <dbl>, cesd17.1 <dbl>, cesd18.1 <dbl>, cesd19.1 <dbl>,
## #   cesd20.1 <dbl>, ahiTotal.1 <dbl>, cesdTotal.1 <dbl>, occasion.2 <dbl>,
## #   elapsed.days.2 <dbl>, ahi01.2 <dbl>, ahi02.2 <dbl>, ahi03.2 <dbl>,
## #   ahi04.2 <dbl>, ahi05.2 <dbl>, ...

Exercise 6

What are the data’s dimensions, variables, and types of variables?

Exercise 7

From posPsy_wide, select id, intervention, sex, age, educ, income, and 6 ahi01 items across six waves, and assign the name posPsy_wide_subset.

# `\\d{2}` is a regular expression that represents `any two-digits`
# a regular expression allow us to match any arbitrary character string. 
# We will discuss the regular expression later. 
posPsy_wide_subset <- posPsy_wide %>%
  select(id:income, matches("ahi01.\\d"))
posPsy_wide_subset
## # A tibble: 295 x 12
##       id intervention   sex   age  educ income ahi01.0 ahi01.1 ahi01.2 ahi01.3
##    <dbl>        <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1     1            4     2    35     5      3       2       3      NA      NA
##  2     2            1     1    59     1      1       3       3       3       3
##  3     3            4     1    51     4      3       3      NA       3      NA
##  4     4            3     1    50     5      2       2       3      NA       1
##  5     5            2     2    58     5      2       1       1       2       2
##  6     6            1     1    31     5      1       2      NA       3       3
##  7     7            3     1    44     5      2       3      NA      NA      NA
##  8     8            2     1    57     4      2       3       2       2       3
##  9     9            1     1    36     4      3       2      NA      NA      NA
## 10    10            2     1    45     4      3       2      NA      NA      NA
## # ... with 285 more rows, and 2 more variables: ahi01.4 <dbl>, ahi01.5 <dbl>

Exercise 8

Using the pivot_longer() in tidyr, make posPsy_wide_subset longer.

Exercise 9

Using separate() in tidyr, create a variable Wave that indicates the measurement index (i.e., 0, 1, 2, 3, 4, 5).