5 Assignment 1 - data wrangling

Package load:

5.1 Introduction

For this homework assignment, please write your answer for each question after the question text but before the line break before the next one.

In some cases, you will have to insert R code chunks, and run them to ensure that you’ve got the right result.

Use all of the R Markdown formatting you want! Bullets, bold text, etc. is welcome.

Once you have submitted your assignment, I will be hitting the “Knit” button—please ensure that the output renders correctly.

5.2 Questions

1. Assign a variable

(1 mark)

Why does this code chunk not work?

stringlenght <- 10

stringlength

Look carefully! (This may seem like an exercise in pointlessness, but training your brain to notice even the tiniest difference will pay off when programming.)

2. Reducing the size of your data set

(3 marks)

Starting with the {mpg} data frame, use filter() and select() to create a single table that contains the following records and variables:

  • compact and subcompact Volkswagens from the model year 1999

  • manufacturer, model, mileage in both city and highway

mpg
## # A tibble: 234 × 12
##    manufacturer model      displ  year   cyl trans    drv     cty   hwy fl    class mpg_per_cubic_l…
##    <chr>        <chr>      <dbl> <int> <int> <chr>    <chr> <int> <int> <chr> <chr>            <dbl>
##  1 audi         a4           1.8  1999     4 auto(l5) f        18    29 p     comp…            16.1 
##  2 audi         a4           1.8  1999     4 manual(… f        21    29 p     comp…            16.1 
##  3 audi         a4           2    2008     4 manual(… f        20    31 p     comp…            15.5 
##  4 audi         a4           2    2008     4 auto(av) f        21    30 p     comp…            15   
##  5 audi         a4           2.8  1999     6 auto(l5) f        16    26 p     comp…             9.29
##  6 audi         a4           2.8  1999     6 manual(… f        18    26 p     comp…             9.29
##  7 audi         a4           3.1  2008     6 auto(av) f        18    27 p     comp…             8.71
##  8 audi         a4 quattro   1.8  1999     4 manual(… 4        18    26 p     comp…            14.4 
##  9 audi         a4 quattro   1.8  1999     4 auto(l5) 4        16    25 p     comp…            13.9 
## 10 audi         a4 quattro   2    2008     4 manual(… 4        20    28 p     comp…            14   
## # … with 224 more rows

3. Selecting variables

(1 mark)

What happens if you include the name of a variable twice in a select() call? Demonstrate using the {gapminder} data. (The package is installed in the workspace, but you will have to load it to access it…)


4. R Markdown

(1 mark)

The following chunk of code produces a chart. When this notebook is previewed, the R code inside the chunk is also rendered.

Enter a chunk option so that the chart shows in the output, but the code does not.

(Hint: use the RMarkdown cheatsheet to find the solution!)

# hide this code!
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy))

5. Flight data

(6 marks)

For Q3a-3d, use the data nycflights13::flights Pay close attention to how the variables are coded. Using the data in the nycflights13::flights table, find all flights that:

  1. Had an arrival delay of two or more hours

  2. Flew to Houston (there are two airports, with the codes IAH or HOU)

  3. Were operated by United, American, or Delta (you may want to review the separate airlines table to check the codes for each airline)

  4. Departed in summer (July, August, and September)

-30-

is licensed under a Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/ -->