What is R and why are you here?
Scriptability, coding, working with our data \(\rightarrow\) R
Reproducible, literate programming with all of our code, narrative, and formatted output in one place \(\rightarrow\) R Markdown
A place to do this \(\rightarrow\) RStudio
Our most important goal: Get R and RStudio running on your computer and make you aware of a powerful set of tools for all types of data analysis, visualization, statistical methods, and report creation.
I’ll be using Socrative during class to take polls to see how we’re doing. You can join our room at https://api.socrative.com/rc/aABEGN.
We will take roughly two to two and half hours to go through the basic material. We’ll code together, import some data, clean it up a bit, summarize it, and make a graph or three. Then, I’ll have you take the same data and do a bit more on your own. You will turn in your own R Markdown file with your work. I will provide step-by-step guidance as we go.
On the left, you’ll find the section markers. Each section covers a portion of the material that we will be discussing this evening. You can always refer back to this.
- Section 1 gets you up-and-running in R and R Studio, our development platform. I also quickly cover markdown and R Markdown notebooks, a way to combine code, text, and graphics in one document.
- Section 2 introduces the tidyverse, or a suite of tools that speak a consistent language and that make using R even easier. We’ll discuss how to import our data, clean it up, and get some basic statistics done.
- Section 3 discusses how data visualization, one of the most powerful features of R. In particular, we’ll use the ggplot2 package from the tidyverse.
If you’re familiar with other programming languages, you might be wondering where the matrices, vectors, and lists are? R has them - we’re just skipping some of the core, foundational ideas to go right into the
tidyverse. Even without any additional tools like the
tidyverse, Base R is a powerful way to do statistics and is the preferred path for some, especially those looking for speed and to combine R with other languages, like C++.
Data Camp has a free introduction to Base R concepts, if you’d like to see more.
We are going to look at the
tidyverse first, since you can get up and running with your analysis quickly. If you want to go deeper into R, though, I do recommend going “back to basics” at some point.
The R community is helpful and friendly, with a wide-range of free material available to keep learning. Much of the material that we are covering can be found at https://datasciencebox.org/index.html, a wonderful resource for learning R.
I am also, um, “borrowing” material from the folks at Software Carpentry.
RStudio has many different tutorials and getting-started videos.
My go-to book is R for Data Science, by the chief scientist at R Studio and a creator of the ggplot package and the Director of Learning at R Studio.
You can also find a large number of cheat sheets for using different R packages on the R Studio webpage.
There are many other books and online resources our there, some very discipline specific. Just a few examples:
A great set of free courses from R for the Rest of Us.
Using R Markdown to create publishable documents and websites by Xie, Allaire, and Grolemund. You’ll see their names a lot in the R world.
This is a brand new site collecting data analysis and statistics code for doing work in STATA, R, and Python. Everything from importing your data, to machine learning, to the latest causal inference techniques.
These videos by Prof. Nick Huntington-Klein are, ostensibly, for economists, but are another great resource. If you are interested in research design, no matter your discipline, you need to read his book.
I borrowed a few examples from these class notes on medical research in R.
Many other free books can be found on the R Studio webpage.