Chapter 10 The dplyr package practice (No need to submit)
10.1 starwars data
A starwars
is a tibble in dplyr
containing 13 variables about the features of 13 characters in the movie.
## # A tibble: 87 x 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke S~ 172 77 blond fair blue 19 male mascu~
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu~
## 3 R2-D2 96 32 <NA> white, bl~ red 33 none mascu~
## 4 Darth ~ 202 136 none white yellow 41.9 male mascu~
## 5 Leia O~ 150 49 brown light brown 19 fema~ femin~
## 6 Owen L~ 178 120 brown, grey light blue 52 male mascu~
## 7 Beru W~ 165 75 brown light blue 47 fema~ femin~
## 8 R5-D4 97 32 <NA> white, red red NA none mascu~
## 9 Biggs ~ 183 84 black light brown 24 male mascu~
## 10 Obi-Wa~ 182 77 auburn, wh~ fair blue-gray 57 male mascu~
## # ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
# similar to str() in Base R
## Rows: 87
## Columns: 14
## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or~
## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2~
## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.~
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N~
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "~
## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",~
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, ~
## $ sex <chr> "male", "none", "none", "male", "female", "male", "female",~
## $ gender <chr> "masculine", "masculine", "masculine", "masculine", "femini~
## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T~
## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma~
## $ films <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return~
## $ vehicles <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp~
## $ starships <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",~
Problem1. How many humans are contained in starwars
overall? (Hint. use count()
- answer: 35 humans
starwars filter(species == "Human") %>%
## # A tibble: 1 x 1
## n
## <int>
## 1 35
starwars count(species)
## # A tibble: 38 x 2
## species n
## <chr> <int>
## 1 Aleena 1
## 2 Besalisk 1
## 3 Cerean 1
## 4 Chagrian 1
## 5 Clawdite 1
## 6 Droid 6
## 7 Dug 1
## 8 Ewok 1
## 9 Geonosian 1
## 10 Gungan 3
## # ... with 28 more rows
Problem 2. How many feminine humans are contained in starwars
- answer: 9 feminine humans
starwars group_by(gender, species) %>%
## # A tibble: 42 x 3
## # Groups: gender, species [42]
## gender species n
## <chr> <chr> <int>
## 1 feminine Clawdite 1
## 2 feminine Droid 1
## 3 feminine Human 9
## 4 feminine Kaminoan 1
## 5 feminine Mirialan 2
## 6 feminine Tholothian 1
## 7 feminine Togruta 1
## 8 feminine Twi'lek 1
## 9 masculine Aleena 1
## 10 masculine Besalisk 1
## # ... with 32 more rows
Problem3. From which homeworld do the most individuals come from?
- answer: Naboo
starwars group_by(homeworld) %>%
count() %>%
## # A tibble: 49 x 2
## # Groups: homeworld [49]
## homeworld n
## <chr> <int>
## 1 Naboo 11
## 2 Tatooine 10
## 3 <NA> 10
## 4 Alderaan 3
## 5 Coruscant 3
## 6 Kamino 3
## 7 Corellia 2
## 8 Kashyyyk 2
## 9 Mirial 2
## 10 Ryloth 2
## # ... with 39 more rows
Problem4. What is the mean height of all individuals with orange eyes from the most popular homeworld?
*answer: 208.6667
starwars filter(homeworld == "Naboo", eye_color == "orange") %>%
summarise(mean_height = mean(height))
## # A tibble: 1 x 1
## mean_height
## <dbl>
## 1 209.
Problem5. What is the standard deviation of height for all droids.
- answer: 49.14977
starwars filter(species == "Droid") %>%
summarise(n = n(),
not_NA_h = sum(!,
md_height = median(height, na.rm = TRUE),
mn_height = mean(height, na.rm = TRUE),
sd_height = sd(height, na.rm = TRUE))
## # A tibble: 1 x 5
## n not_NA_h md_height mn_height sd_height
## <int> <int> <int> <dbl> <dbl>
## 1 6 5 97 131. 49.1