Exercise 1 - data wrangling {#exercise1}

Setup

This chunk of R code loads the packages that we will be using.

#Package load

library(tidyverse)
library(stringr)

# data files
library(nycflights13)

Introduction

For this homework exercise, use the functions that were introduced in class to solve the following problems. You shouldn’t have to use any functions that are not covered in class! The course book should be your first point of reference as you tackle the problems.

Please write your answer for each question after the question text but before the line break before the next one.

In some cases, you will have to insert R code chunks, and run them to ensure that you’ve got the right result.

Use all of the R Markdown formatting you want! Bullets, bold text, etc. is welcome.

Once you have submitted your assignment, I will be hitting the “Knit” button—please ensure that the output renders correctly.

Questions

1. Assign a variable

Why does this code chunk not work?

stringlenght <- 10

stringlength

Look carefully! (This may seem like an exercise in pointlessness, but training your brain to notice even the tiniest difference will pay off when programming.)

Answer

This does not work because…

2. Reducing the size of your data set

Starting with the {mpg} data frame, use filter() and select() to create a single table that contains only the following records and variables:

  • compact and subcompact Volkswagens from the model year 1999

  • manufacturer, model, mileage in both city and highway

First, view the data table:

mpg
## # A tibble: 234 × 12
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class mpg_per_cubic_litre
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>               <dbl>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…               16.1 
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…               16.1 
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…               15.5 
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…               15   
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…                9.29
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…                9.29
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…                8.71
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…               14.4 
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…               13.9 
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…               14   
## # ℹ 224 more rows

3. Selecting variables

What happens if you include the name of a variable twice in a select() call? Demonstrate using the {gapminder} data. (The package is installed in the workspace, but you will have to load it to access it…)


4. R Markdown

The following chunk of code produces a chart. When this notebook is previewed, the R code inside the chunk is also rendered.

Enter a chunk option so that the chart shows in the output, but the code does not.

(Hint: use the RMarkdown cheatsheet to find the solution!)

# hide this code!
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy))

5. Flight data

For Q3a-3d, use the data nycflights13::flights

Sidebar: To see the names of all of the tables in the {nycflights13} package, use the data() function:

data(package = "nycflights13")

Pay close attention to how the variables are coded. Type ?flights into the console to access the details about this data in the “Help” pane.

Using the data in the nycflights13::flights table, find all flights that:

  1. Had an arrival delay of two or more hours

  2. Flew to Houston (there are two airports, with the codes IAH or HOU)

  3. Were operated by United, American, or Delta (you may want to review the separate airlines table to check the codes for each airline)

  4. Departed in summer (July, August, and September)

-30-

is licensed under a Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/ -->