12 R Introduction

Sections in this Module:
–Install R
–Basic Tutorial
–Data Types
–Popular Libraries
–Joining Tables

Install R and R Studio

This is free and open source software. It is not large and doesn’t tax the memory a lot. R runs on Windows, Mac and Linux, but this course is designed for the Mac version. If you use Windows, there may be variations in the lessons and instructions. Please see me for questions. Installing R is a two-step process:

  1. Install R, the actual program

Mac.

Windows

Accept all of the default settings.

  1. Install RStudio. This is the graphical interface we use to manage and create R code. Download the open source edition of R Studio desktop and follow the prompts to install it.

Download R Studio

More detailed instructions for installing R and R Studio here

Here’s a good overview of the program

Basic R Tutorial

An introduction to R commands, from the 2020 NICAR Conference

Source code for this lesson Remember to Left click, Download, remove.txt extension so it ends in .Rmd Source code

Background Reading Machlis, Sharon. Practical R for Mass Communications and Journalism. Chapman & Hall/CRC The R Series. 2018. ISBN 9781138726918

Here are some free sample chapters

  Ch. 1: Introduction     
  Ch. 2: Get Started With R in a Few Easy Steps    
  Ch. 3: See How Much You Can Do in a Few Lines of Code     

Exercise: Import Arkansascovid.com data and summarize Washington County trends

Download master file data

Ch 1 & 2 of Machlis: Key Points
Reproducible research Repetitive tasks in modern newsrooms. 
Employment reports, crime stats, budgets 

Variables - an R object 
Assignment operator <- Case sensitive 
Vector: A vector can only have one type of data - all integers, all strings 
Dataframe - like a spreadsheet 
Save files - Don’t save workspace: because all of your variables will be stored and re-loaded the next time you launch RStudio. 
It’s too easy to forget about previously stored variables that can interfere with later work,
Software packages: tidyverse, rio, pacman
Data Types and R
Machlis: 2.4.2 Data types you’re likely to use often
Reference: Logical Operators in R

Joining Tables

We will now join two datasets together on one or more common fields or columns. We’ll do this through an inner_join (return only matching records) using the dplyr package.

Check out this slide deck from Charles Minshew of IRE, starting at slide #25, for more details on inner and outer joins in R.

See the sample code below: replace the two instances of “fieldname” with the correct names of columns in your two tables that match. And replace “table1” and “table2” with the names of the two data frames you are joining.

newtable <- inner_join(table1, table2, by=c("fieldname"="fieldname"))

Example: Use student loan data. Import the two dataframes from this link

Question #1: AR2016mini

–Create a new R Markdown document, write the R code to import the spreadsheet and that tab. –How many rows, columns?

Question #2: AR2012mini

–Write the R code to import the spreadsheet and that tab –How many rows, columns?

–Identify key differences in AR2012 and AR2016.
–What is the common field?

Question #3: With your information about the tables, use this template and join the two tables

newtable <- inner_join(table1, table2, by=c("fieldname"="fieldname"))

–How many fields joined? –How can we improve the combined dataframe?

This code below renames the two debt fields. That way we can keep the numbers straight.

AR2012 <- AR2012 %>% 
  rename(DEBT_MDN_2012 = DEBT_MDN)

AR2016 <- AR2016 %>% 
  rename(DEBT_MDN_2016 = DEBT_MDN)

Question #4: After renaming, rejoin the tables again

Records that don’t match: anti_join

We can determine which records don’t match by using the anti_join command.

anti_join(table1, table2, by="id")

Question #5: Run the anti_join command on the AR2012 and AR2016 tables.

–Identify the records that didn’t match. Why?

Do basic math - create a column to measure which schools had the highest percentage change in their Debt_Mdn

The math formula

AR2012_16mini$PctChange <- ((AR2012_16mini$DEBT_MDN_2016 - AR2012_16mini$DEBT_MDN_2012) /  AR2012_16mini$DEBT_MDN_2012)

Display Percentage Signs

 AR2012_16mini$PctChange <- formattable::percent(AR2012_16mini$PctChange, 2, format = "f")

Question #5: Build a table just with College name, two debt fields and percentage change.

REVEL IN YOUR NERD POWERS