Chapter 2 RStudio

RStudio is what is called an Integrated Development Environment for R. It is dependent on R but separate from it. There are several ways of using R but R Studio is arguably the most popular and convenient. Let’s have a look at it.

 

The “heart of R” is the Console window. This is where instructions are sent to R, and its responses are given. The console is, almost exclusively, the way of talking to R in RStudio.

The Information area (all of the right-hand side of RStudio) shows you useful information about the state of your project. At the moment, you can see some relevant files in the bottom pane, and an empty “Global Environment” at the top. The global environment is a virtual storage of all objects you create in R. So, for example, if you read in some data into R, this is where they will be put and where R will look for them if you tell it to manipulate or analyse the data.

Finally, the Editor is where you write more complicated scripts without having to run each command. Each script, when saved, is just a text file with some added bells and whistles. There’s nothing special about it. Indeed, if you wanted to, you could write your script in any plain text editor, save it, change its extension from .txt to .R and open it in RStudio. There is no need to do this but you could. When you run such a script file, it gets interpreted by R in a line by line fashion. This means that your data cleaning, processing, visualisation, and analysis needs to be written up in sequence otherwise R won’t be able to make sense of your script. Since the script is basically a plain text file (with some nice colours added by RStudio to improve readability), the commands you type in can be edited, added, or deleted just like if you were writing an ordinary document. You can run them again later, or build up complex commands and functions over several lines of text.

There is an important practical distinction between the Console and the Editor: In the Console, the Enter key runs the command. In the Editor, it just adds a new line. The purpose of this is to facilitate writing scripts without running each line of code. It also enables you to break down your commands over multiple lines so that you don’t end up with a line that’s hundreds of characters long. For example:

poisson_model <- glm( # R knows that an open bracket can't be the end of command...
  n_events ~ gender + scale(age) + scale(n_children) + # ...nor can a plus...
    I(SES - min(SES)) * scale(years_emp, , F), # ...or a comma
  df, family = "poisson") # closing bracket CAN be the end

The hash (#) marks everything to the right of it as comment. Comments are useful for annotating the code so that you can remember what it means when you return to your code months later (it will happen!). It also improves code readability if you’re working on a script in collaboration with others. Comments should be clear but also concise. There is no point in paragraphs of verbose commentary.

Writing (and saving) scripts has just too many advantages over coding in the console to list and it it is crucial that you learn how to do it. It will enable you to write reproducible code you can rerun whenever needed, reuse chunks of code you created for a previous project in your analysis, and, when you realise you made a mistake somewhere (when, not if, because this too will happen!), you’ll be able to edit the code and recreate your analysis in a small fraction of the time it would take you to analyse your data anew without a script. This way, if you write a command and it doesn’t do exactly what you wanted to do, you can quickly tweak it in the editor and run it again. Also, if you accidentally modify and object and mess everything up, you can just re-run your entire script up to that point and pretend nothing ever happened. Or different still, let’s say you analysed your data and wrote your report and then you realised you made a mistake, for instance forgot to exclude data you should have excluded or excluded some you shouldn’t have. Without a “paper-trail” of your analysis, this is a very unpleasant (but, sadly, not unheard of) experience. But with a saved analysis script, you can just insert an additional piece of code in the file, re-run it and laugh it off. Or do the first two, and then take a long hard look at yourself! Whatever the case, using the script editor is just very, very useful!

However, the aim is to keep the script tidy. You don’t need to put every single line of code you ever run into it. Sometimes, you just want to look at your data, using, for example View(df). This kind of command really doesn’t need to be in your script. As a general rule of thumb, use the editor for code that adds something of value to the sequence of the script (data cleaning, analysis, code generating plots, tables, etc.) and the console for one-off commands (when you want to just check something).

Here is an example of what a neat script looks like (part of the Reproducibility Project: Psychology analysis2). Compare it to your own scripts and try to find ways of improving your coding.

For useful “good practice” guidance on how to write legible code, see the Google style guide.


  1. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.