Welcome!
What is R and why are you here?
We are to spend our time tonight learning about R, R Markdown, and the developer environment that puts these tools together, R Studio. How do these tools fit together?
Scriptability, coding, working with our data \(\rightarrow\) R
Reproducible, literate programming with all of our code, narrative, and formatted output in one place \(\rightarrow\) R Markdown
A place to do this \(\rightarrow\) RStudio
Our most important goal: Get R and RStudio running on your computer and make you aware of a powerful set of tools for all types of data analysis, visualization, statistical methods, and report creation.
I’ll be using Socrative during class to take polls to see how we’re doing. You can join our room at https://api.socrative.com/rc/aABEGN.
Our Time Together
We will take roughly two to two and half hours to go through the basic material. We’ll code together, import some data, clean it up a bit, summarize it, and make a graph or three. Then, I’ll have you take the same data and do a bit more on your own. You will turn in your own R Markdown file with your work. I will provide step-by-step guidance as we go.
How These Notes Are Structured
On the left, you’ll find the section markers. Each section covers a portion of the material that we will be discussing this evening. You can always refer back to this.
- Section 1 gets you up-and-running in R and R Studio, our development platform. I also quickly cover markdown and R Markdown notebooks, a way to combine code, text, and graphics in one document.
- Section 2 introduces the tidyverse, or a suite of tools that speak a consistent language and that make using R even easier. We’ll discuss how to import our data, clean it up, and get some basic statistics done.
- Section 3 discusses how data visualization, one of the most powerful features of R. In particular, we’ll use the ggplot2 package from the tidyverse.
What are we not doing?
If you’re familiar with other programming languages, you might be wondering where the matrices, vectors, and lists are? R has them - we’re just skipping some of the core, foundational ideas to go right into the tidyverse
. Even without any additional tools like the tidyverse
, Base R is a powerful way to do statistics and is the preferred path for some, especially those looking for speed and to combine R with other languages, like C++.
Data Camp has a free introduction to Base R concepts, if you’d like to see more.
We are going to look at the tidyverse
first, since you can get up and running with your analysis quickly. If you want to go deeper into R, though, I do recommend going “back to basics” at some point.
Recommended Materials For After This Evening
The R community is helpful and friendly, with a wide-range of free material available to keep learning. Much of the material that we are covering can be found at https://datasciencebox.org/index.html, a wonderful resource for learning R.
I am also, um, “borrowing” material from the folks at Software Carpentry.
RStudio has many different tutorials and getting-started videos.
My go-to book is R for Data Science, by the chief scientist at R Studio and a creator of the ggplot package and the Director of Learning at R Studio.
For graphics, I’d start with Prof. Kieran Healy at Duke and his great book on data visualization using ggplot2.
You can also find a large number of cheat sheets for using different R packages on the R Studio webpage.
There are many other books and online resources our there, some very discipline specific. Just a few examples:
A great set of free courses from R for the Rest of Us.
There’s also an online ggplot2 text written by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen.
Interested in version control with R? Check out Happy Git by Jenny Bryan, also at RStudio.
Using R Markdown to create publishable documents and websites by Xie, Allaire, and Grolemund. You’ll see their names a lot in the R world.
Text mining with R, by Julia Silge, is the place to start for working with text documents. She’s another R celebrity.
This is a brand new site collecting data analysis and statistics code for doing work in STATA, R, and Python. Everything from importing your data, to machine learning, to the latest causal inference techniques.
These videos by Prof. Nick Huntington-Klein are, ostensibly, for economists, but are another great resource. If you are interested in research design, no matter your discipline, you need to read his book.
Speaking of research design, The Mix Tape by Prof. Scott Cunningham is the one to read after Prof. Klein’s book. It contains R code as well.
I borrowed a few examples from these class notes on medical research in R.
And, for when you’re ready, we have Advanced R by the one and only Hadley Wickham.
Many other free books can be found on the R Studio webpage.