12 R Introduction
Sections in this Module:
–Install R
–Basic Tutorial
–Data Types
–Popular Libraries
–Joining Tables
Install R and R Studio
This is free and open source software. It is not large and doesn’t tax the memory a lot. R runs on Windows, Mac and Linux, but this course is designed for the Mac version. If you use Windows, there may be variations in the lessons and instructions. Please see me for questions. Installing R is a two-step process:
- Install R, the actual program
Accept all of the default settings.
- Install RStudio. This is the graphical interface we use to manage and create R code. Download the open source edition of R Studio desktop and follow the prompts to install it.
More detailed instructions for installing R and R Studio here
Here’s a good overview of the program
Basic R Tutorial
An introduction to R commands, from the 2020 NICAR Conference
Source code for this lesson Remember to Left click, Download, remove.txt extension so it ends in .Rmd Source code
Background Reading Machlis, Sharon. Practical R for Mass Communications and Journalism. Chapman & Hall/CRC The R Series. 2018. ISBN 9781138726918
Here are some free sample chapters
Ch. 1: Introduction
Ch. 2: Get Started With R in a Few Easy Steps
Ch. 3: See How Much You Can Do in a Few Lines of Code
Exercise: Import Arkansascovid.com data and summarize Washington County trends
Ch 1 & 2 of Machlis: Key PointsReproducible research Repetitive tasks in modern newsrooms.
Employment reports, crime stats, budgets
Variables - an R object
Assignment operator <- Case sensitive
Vector: A vector can only have one type of data - all integers, all strings
Dataframe - like a spreadsheet
Save files - Don’t save workspace: because all of your variables will be stored and re-loaded the next time you launch RStudio.
It’s too easy to forget about previously stored variables that can interfere with later work,
Software packages: tidyverse, rio, pacman
Data Types and R
Machlis: 2.4.2 Data types you’re likely to use often
Reference: Logical Operators in R
Joining Tables
We will now join two datasets together on one or more common fields or columns. We’ll do this through an inner_join (return only matching records) using the dplyr package.
Check out this slide deck from Charles Minshew of IRE, starting at slide #25, for more details on inner and outer joins in R.
See the sample code below: replace the two instances of “fieldname” with the correct names of columns in your two tables that match. And replace “table1” and “table2” with the names of the two data frames you are joining.
newtable <- inner_join(table1, table2, by=c("fieldname"="fieldname"))
Example: Use student loan data. Import the two dataframes from this link
Question #1: AR2016mini
–Create a new R Markdown document, write the R code to import the spreadsheet and that tab. –How many rows, columns?
Question #2: AR2012mini
–Write the R code to import the spreadsheet and that tab –How many rows, columns?
–Identify key differences in AR2012 and AR2016.
–What is the common field?
Question #3: With your information about the tables, use this template and join the two tables
newtable <- inner_join(table1, table2, by=c("fieldname"="fieldname"))
–How many fields joined? –How can we improve the combined dataframe?
This code below renames the two debt fields. That way we can keep the numbers straight.
AR2012 <- AR2012 %>%
rename(DEBT_MDN_2012 = DEBT_MDN)
AR2016 <- AR2016 %>%
rename(DEBT_MDN_2016 = DEBT_MDN)
Question #4: After renaming, rejoin the tables again
Records that don’t match: anti_join
We can determine which records don’t match by using the anti_join command.
anti_join(table1, table2, by="id")
Question #5: Run the anti_join command on the AR2012 and AR2016 tables.
–Identify the records that didn’t match. Why?
Do basic math - create a column to measure which schools had the highest percentage change in their Debt_Mdn
The math formula
AR2012_16mini$PctChange <- ((AR2012_16mini$DEBT_MDN_2016 - AR2012_16mini$DEBT_MDN_2012) / AR2012_16mini$DEBT_MDN_2012)
Display Percentage Signs
AR2012_16mini$PctChange <- formattable::percent(AR2012_16mini$PctChange, 2, format = "f")
Question #5: Build a table just with College name, two debt fields and percentage change.
REVEL IN YOUR NERD POWERS