Exercise 1 - data wrangling {#exercise1}
Introduction
For this homework exercise, use the functions that were introduced in class to solve the following problems. You shouldn’t have to use any functions that are not covered in class! The course book should be your first point of reference as you tackle the problems.
Please write your answer for each question after the question text but before the line break before the next one.
In some cases, you will have to insert R code chunks, and run them to ensure that you’ve got the right result.
Use all of the R Markdown formatting you want! Bullets, bold text, etc. is welcome.
Once you have submitted your assignment, I will be hitting the “Knit” button—please ensure that the output renders correctly.
Questions
1. Assign a variable
Why does this code chunk not work?
stringlenght <- 10
stringlength
Look carefully! (This may seem like an exercise in pointlessness, but training your brain to notice even the tiniest difference will pay off when programming.)
2. Reducing the size of your data set
Starting with the {mpg} data frame, use filter()
and select()
to create a single table that contains only the following records and variables:
compact and subcompact Volkswagens from the model year 1999
manufacturer, model, mileage in both city and highway
First, view the data table:
mpg
## # A tibble: 234 × 12
## manufacturer model displ year cyl trans drv cty hwy fl class mpg_per_cubic_litre
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> <dbl>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp… 16.1
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp… 16.1
## 3 audi a4 2 2008 4 manu… f 20 31 p comp… 15.5
## 4 audi a4 2 2008 4 auto… f 21 30 p comp… 15
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp… 9.29
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp… 9.29
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp… 8.71
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp… 14.4
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp… 13.9
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp… 14
## # ℹ 224 more rows
3. Selecting variables
What happens if you include the name of a variable twice in a select()
call? Demonstrate using the {gapminder} data. (The package is installed in the workspace, but you will have to load it to access it…)
4. R Markdown
The following chunk of code produces a chart. When this notebook is previewed, the R code inside the chunk is also rendered.
Enter a chunk option so that the chart shows in the output, but the code does not.
(Hint: use the RMarkdown cheatsheet to find the solution!)
# hide this code!
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
5. Flight data
For Q3a-3d, use the data nycflights13::flights
Sidebar: To see the names of all of the tables in the {nycflights13} package, use the
data()
function:
data(package = "nycflights13")
Pay close attention to how the variables are coded. Type ?flights
into the console to access the details about this data in the “Help” pane.
Using the data in the nycflights13::flights
table, find all flights that:
Had an arrival delay of two or more hours
Flew to Houston (there are two airports, with the codes IAH or HOU)
Were operated by United, American, or Delta (you may want to review the separate airlines table to check the codes for each airline)
Departed in summer (July, August, and September)
-30-