10 Assignment 2 - data wrangling & import
Package load:
library(tidyverse) # functions using {tidyr}, {dplyr}, and {readr}
library(readxl) # to read Excel files
library(janitor) # for data cleaning
10.2 Introduction
For this homework assignment, please write your answer for each question after the question text but before the line break before the next one.
In some cases, you will have to insert R code chunks, and run them to ensure that you’ve got the right result.
Use all of the R Markdown formatting you want! Bullets, bold text, etc. is welcome. And don’t forget to consider changes to the YAML.
Once you have finished your assignment, create an HTML document by “knitting” the document using either the “Preview” or “Knit” button in the top left of the script window frame.
10.3 Questions
1. Religion and income data (USA)
(2 marks)
The table relig_income
is in {tidyr}. Load this table, and take a look at it. You can use View(relig_income)
to see the table, and to understand the contents of the table you can use the Help function ?tidyr::relig_income
(enter that code in the console).
head(tidyr::relig_income)
## # A tibble: 6 × 11
## religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k` `$100-150k` `>150k`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Agnostic 27 34 60 81 76 137 122 109 84
## 2 Atheist 12 27 37 52 35 70 73 59 74
## 3 Buddhist 27 21 30 34 33 58 62 39 53
## 4 Catholic 418 617 732 670 638 1116 949 792 633
## 5 Don’t kn… 15 14 15 11 10 35 21 17 18
## 6 Evangeli… 575 869 1064 982 881 1486 949 723 414
## # … with 1 more variable: `Don't know/refused` <dbl>
What tidy principle(s) does this table violate?
Using the functions in {tidyr}, arrange this table in such a way that it is no longer untidy.
2. Importing data: csv
(5 marks)
NOTE: Variable names with spaces and special symbols are a challenge to work with. >They require that you put the name inside a pair of back-ticks. For example, in the >dataset below, a variable called
Court Region
must be inside the back-ticks.Remember that you have the option to use the
clean_names()
function in the {janitor} >package to resolve this!
- The reference page for {janitor} is here: https://garthtarr.github.io/meatR/janitor.html
Comma delimited files are one of the most common formats used for flat files. The tidyverse has a package, {readr}, which will automatically load with the function library(tidyverse)
.
- The reference page for {readr} is here: https://readr.tidyverse.org/index.html
- Read in the file “cy-concluded-cases-5-yr-regional-report-dashboard-2014-to-2018.csv”, which has the number of concluded court cases in B.C. by a variety of dimensions.3
Assign the contents of the file to a named object (remember, name things in a descriptive way!)
Read the file again, but specify that the region code should be a character.
Now you’ve read the contents of the file, use your data manipulation skills to create a table that shows the total number of cases that were completed in each court region in 2018. Sort the table so the busiest regions are at the top.
Create a table showing the number of cases by court class completed in each region during 2018, with the region as the rows and the court class as the columns.
3. Importing data: Excel
(5 marks)
Use the {readxl} package to read in the file “bcmetal.xlsx”
This data is published by the Province of British Columbia, and shows the production of metals from mines in British Columbia from 1888 to 1988. 4
There are three sheets in the file. The sheet we are interested in is BCMETAL_PROD, metal production table for 1452 metal mines operating within the province between 1888 and 1988.
- The reference page for {readxl} is here: https://readxl.tidyverse.org/
What are the names of the sheets in this file?
Read in the contents of the “BCMETAL_PROD” sheet.
This will generate some warnings. What issues are there with the rows?
Change the options for your read function to address any issues you identified in 3.c
Use your data manipulation skills to answer the following, as separate calculations:
Based on the data here, in what year was the most gold mined in British Columbia?
What mine holds the record for the total amount of silver extracted in B.C.?
Does that mine hold the record for the total silver mined during a single year in B.C.?
-30-