Chapter 10 The dplyr package practice (No need to submit)

suppressMessages(library(tidyverse))

10.1 starwars data

A starwars is a tibble in dplyr containing 13 variables about the features of 13 characters in the movie.

starwars
## # A tibble: 87 x 14
##    name    height  mass hair_color  skin_color eye_color birth_year sex   gender
##    <chr>    <int> <dbl> <chr>       <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke S~    172    77 blond       fair       blue            19   male  mascu~
##  2 C-3PO      167    75 <NA>        gold       yellow         112   none  mascu~
##  3 R2-D2       96    32 <NA>        white, bl~ red             33   none  mascu~
##  4 Darth ~    202   136 none        white      yellow          41.9 male  mascu~
##  5 Leia O~    150    49 brown       light      brown           19   fema~ femin~
##  6 Owen L~    178   120 brown, grey light      blue            52   male  mascu~
##  7 Beru W~    165    75 brown       light      blue            47   fema~ femin~
##  8 R5-D4       97    32 <NA>        white, red red             NA   none  mascu~
##  9 Biggs ~    183    84 black       light      brown           24   male  mascu~
## 10 Obi-Wa~    182    77 auburn, wh~ fair       blue-gray       57   male  mascu~
## # ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>
# similar to str() in Base R
glimpse(starwars)
## Rows: 87
## Columns: 14
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or~
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2~
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.~
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N~
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "~
## $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",~
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, ~
## $ sex        <chr> "male", "none", "none", "male", "female", "male", "female",~
## $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini~
## $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T~
## $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma~
## $ films      <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return~
## $ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp~
## $ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",~

Problem1. How many humans are contained in starwars overall? (Hint. use count())

  • answer: 35 humans
starwars %>%
  filter(species == "Human") %>%
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1    35
starwars %>%
  count(species)
## # A tibble: 38 x 2
##    species       n
##    <chr>     <int>
##  1 Aleena        1
##  2 Besalisk      1
##  3 Cerean        1
##  4 Chagrian      1
##  5 Clawdite      1
##  6 Droid         6
##  7 Dug           1
##  8 Ewok          1
##  9 Geonosian     1
## 10 Gungan        3
## # ... with 28 more rows

Problem 2. How many feminine humans are contained in starwars?

  • answer: 9 feminine humans
starwars %>%
  group_by(gender, species) %>%
  count()
## # A tibble: 42 x 3
## # Groups:   gender, species [42]
##    gender    species        n
##    <chr>     <chr>      <int>
##  1 feminine  Clawdite       1
##  2 feminine  Droid          1
##  3 feminine  Human          9
##  4 feminine  Kaminoan       1
##  5 feminine  Mirialan       2
##  6 feminine  Tholothian     1
##  7 feminine  Togruta        1
##  8 feminine  Twi'lek        1
##  9 masculine Aleena         1
## 10 masculine Besalisk       1
## # ... with 32 more rows

Problem3. From which homeworld do the most individuals come from?

  • answer: Naboo
starwars %>%
  group_by(homeworld) %>%
  count() %>%
  arrange(desc(n))
## # A tibble: 49 x 2
## # Groups:   homeworld [49]
##    homeworld     n
##    <chr>     <int>
##  1 Naboo        11
##  2 Tatooine     10
##  3 <NA>         10
##  4 Alderaan      3
##  5 Coruscant     3
##  6 Kamino        3
##  7 Corellia      2
##  8 Kashyyyk      2
##  9 Mirial        2
## 10 Ryloth        2
## # ... with 39 more rows

Problem4. What is the mean height of all individuals with orange eyes from the most popular homeworld?

*answer: 208.6667

starwars %>%
   filter(homeworld == "Naboo", eye_color == "orange") %>%
   summarise(mean_height = mean(height))
## # A tibble: 1 x 1
##   mean_height
##         <dbl>
## 1        209.

Problem5. What is the standard deviation of height for all droids.

  • answer: 49.14977
starwars %>%
  filter(species == "Droid") %>%
  summarise(n = n(),
            not_NA_h = sum(!is.na(height)),
            md_height = median(height, na.rm = TRUE),
            mn_height = mean(height, na.rm = TRUE),
            sd_height = sd(height, na.rm = TRUE))
## # A tibble: 1 x 5
##       n not_NA_h md_height mn_height sd_height
##   <int>    <int>     <int>     <dbl>     <dbl>
## 1     6        5        97      131.      49.1