10 Assignment 2 - data wrangling & import

Package load:

library(tidyverse) # functions using {tidyr}, {dplyr}, and {readr}
library(readxl)    # to read Excel files
library(janitor)   # for data cleaning

10.1 YOUR NAME HERE

10.2 Introduction

For this homework assignment, please write your answer for each question after the question text but before the line break before the next one.

In some cases, you will have to insert R code chunks, and run them to ensure that you’ve got the right result.

Use all of the R Markdown formatting you want! Bullets, bold text, etc. is welcome. And don’t forget to consider changes to the YAML.

Once you have finished your assignment, create an HTML document by “knitting” the document using either the “Preview” or “Knit” button in the top left of the script window frame.

10.3 Questions

1. Religion and income data (USA)

(2 marks)

The table relig_income is in {tidyr}. Load this table, and take a look at it. You can use View(relig_income) to see the table, and to understand the contents of the table you can use the Help function ?tidyr::relig_income (enter that code in the console).

## # A tibble: 6 × 11
##   religion  `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k` `$100-150k` `>150k`
##   <chr>       <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>      <dbl>       <dbl>   <dbl>
## 1 Agnostic       27        34        60        81        76       137        122         109      84
## 2 Atheist        12        27        37        52        35        70         73          59      74
## 3 Buddhist       27        21        30        34        33        58         62          39      53
## 4 Catholic      418       617       732       670       638      1116        949         792     633
## 5 Don’t kn…      15        14        15        11        10        35         21          17      18
## 6 Evangeli…     575       869      1064       982       881      1486        949         723     414
## # … with 1 more variable: `Don't know/refused` <dbl>
  1. What tidy principle(s) does this table violate?

  2. Using the functions in {tidyr}, arrange this table in such a way that it is no longer untidy.

2. Importing data: csv

(5 marks)

NOTE: Variable names with spaces and special symbols are a challenge to work with. >They require that you put the name inside a pair of back-ticks. For example, in the >dataset below, a variable called Court Region must be inside the back-ticks.

Remember that you have the option to use the clean_names() function in the {janitor} >package to resolve this!

Comma delimited files are one of the most common formats used for flat files. The tidyverse has a package, {readr}, which will automatically load with the function library(tidyverse).

  1. Read in the file “cy-concluded-cases-5-yr-regional-report-dashboard-2014-to-2018.csv”, which has the number of concluded court cases in B.C. by a variety of dimensions.3

Assign the contents of the file to a named object (remember, name things in a descriptive way!)

  1. Read the file again, but specify that the region code should be a character.

  2. Now you’ve read the contents of the file, use your data manipulation skills to create a table that shows the total number of cases that were completed in each court region in 2018. Sort the table so the busiest regions are at the top.

  3. Create a table showing the number of cases by court class completed in each region during 2018, with the region as the rows and the court class as the columns.

3. Importing data: Excel

(5 marks)

Use the {readxl} package to read in the file “bcmetal.xlsx”

This data is published by the Province of British Columbia, and shows the production of metals from mines in British Columbia from 1888 to 1988. 4

There are three sheets in the file. The sheet we are interested in is BCMETAL_PROD, metal production table for 1452 metal mines operating within the province between 1888 and 1988.

  1. What are the names of the sheets in this file?

  2. Read in the contents of the “BCMETAL_PROD” sheet.

  3. This will generate some warnings. What issues are there with the rows?

  4. Change the options for your read function to address any issues you identified in 3.c

  5. Use your data manipulation skills to answer the following, as separate calculations:

  • Based on the data here, in what year was the most gold mined in British Columbia?

  • What mine holds the record for the total amount of silver extracted in B.C.?

  • Does that mine hold the record for the total silver mined during a single year in B.C.?

-30-

This file by Martin Monkman is licensed under a Creative Commons Attribution 4.0 International License -->