1 RStudio Projects
As some of you are beginners, it might be hard for you to see the point in setting up Projects and a GitHub account already. The intermediate and advanced users among you, who are not familiar with projects and GitHub yet though, might also wonder what they would need it for: working with R has gone pretty well in the past, so why should you change this running system?
I start out making points on why using Projects is useful. Then, I will provide step-by-step guidance on how to set them up. Since using GitHub is not that straight-forward, I will motivate why to use it and then link to a bigger tutorial covering the setup process (again by Jennifer Bryan, a statistic professor who also works at RStudio).
1.1 RStudio Projects
1.1.1 Motivation
Disclaimer: those things might not be entirely clear right away. However, I am deeply convinced that it is important that you use R and RStudio properly from the start. Otherwise it won’t be as easy to re-build the right habits.
If you analyze data with R, one of the first things you do is to load in the data that you want to perform your analyses on. Then, you perform your analyses on them, and save the results in the (probably) same directory.
When you load a data set into R, you might use the readr
package and do read_csv(absolute_file_path.csv)
. This becomes fairly painful if you need to read in more than one data set. Then, relative paths (i.e., where you start from a certain point in your file structure, e.g., your file folder) become more useful. How you CAN go across this is to use the setwd(absolute_file_path_to_your_directory)
function. Here, set
stands for set and wd
stands for working directory. If you are not sure about what the current working directory actually is, you can use getwd()
which is the equivalent to setwd(file_path)
. This enables you to read in a data set – if the file is in the working directory – by only using read_csv(file_name.csv)
.
However, if you have ever worked on an R project with other people in a group and exchanged scripts regularly, you may have encountered one of the big problems with this setwd(file_path)
approach: as it only takes absolute paths like this one: “/Users/felixlennert/Library/Mobile Documents/comappleCloudDocs/phd/teaching/hhs-stockholm/fall2021/scripts/”, no other person will be able to run this script without making any changes1. Just to be clear: there are no two machines which have the exact same file structure.
This is where RStudio Projects come into play: they make every file path relative. The Project file (ends with .Rproj) basically sets the working directory to the folder it is in. Hence, if you want to send your work to a peer or a teacher, just send a folder which also contains the .Rproj file and they will be able to work on your project without the hassle of pasting file paths into setwd()
commands.
1.1.2 How to create an RStudio Project?
I strongly suggest that you set up a project which is dedicated to this workshop
- In RStudio, click File >> New Project…
- A windows pops up which lets you select between “New Directory”, “Existing Directory”, and “Version Control.” The first option creates a new folder which is named after your project, the second one “associates a project with an existing working directory,” and the third one only applies to version control (like, for instance, GitHub) users. I suggest that you click “New Directory”.
- Now you need to specify the type of the project (Empty project, R package, or Shiny Web Application). In our case, you will need a “new project.” Hit it!
- The final step is to choose the folder the project will live in. If you have already created a folder which is dedicated to this course, choose this one, and let the project live in there as a sub-directory.
- When you write code for our course in the future, you first open the R project – by double-clicking the .Rproj file – and then create either a new script or open a former one (e.g., by going through the “Files” tab in the respective pane which will show the right directory already.)
1.1.3 The working directory in R
As mentioned when introducing RStudio Projects, there are two kinds of file paths you can provide R with: absolute and relative paths.
The absolute path for this script on my machine looks like this: “/Users/felixlennert/Documents/phd/teaching/data-prep_2days/data-cleaning-book/tidy-data.qmd”.
If you are on a Windows machine and copy file paths: R uses the file path separator \
as a so-called escape character – hence, it does not recognize it as a file path separator. You can address this problem by either using double back-slashes \\
or using a normal slash, /
, instead.
There is always a working directory you are in. You can obtain your working directory using getwd()
. Relative paths then just build upon it. If you want to change your working directory, use setwd()
.
Please note that I included the former two paragraphs just for the record. You should never use absolute paths, except for if you are planning to keep the same machine for the rest of your life and never change any of your file structure. You are not. Hence, please use RStudio Projects.2
If you are using RStudio Projects, your working directory defaults to the folder your .Rproj
file is stored in. So, I guess you could immediately see the merit of using RStudio projects: by using them, you can do away with all the faff of setting working directories, copying crazy long absolute paths, and, relatedly, searching for typos in them.
If you are working in RMarkdown, or quarto, its successor, the working directory is where your RMarkdown document is stored in.
1.1.4 Further links
- Hadley Wickham and Garrett Grolemund wrote an entire chapter in R4DS on Projects.
- Need more motivation? Jennifer Bryan tells you under which circumstances she would set your machine on fire: Project-oriented workflow.
- If you have created your project folder and are now unsure how to structure it, read Chris von Csefalvay’s blog post on how to do it.