2 Thinking Like a Computer

This chapter shows how to download and install R. It also touches upon how computers work when we are programming with R.

2.1 Introducing R

R is both a programming language and a software environment for statistical computing and research. It offers robust data analytics and information processing capabilities, and is popular among data scientists and programmers alike.

R has 1

  • an effective data handling and storage facility
  • a suite of operators for calculations on arrays, in particular matrices
  • a large, coherent, integrated collection of intermediate tools for data analysis
  • graphical facilities for data analysis and display either directly at the computer or on hardcopy
  • a well developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities

In short, R is a great resource for data analysis and visualization. It provides a toolbox of statistical techniques, and it is easy to work with graphs in R.

In R, a statistical analysis is done as a series of steps. Intermediate results are stored in objects; no results may be displayed in an analysis. We may extract the part of the results of interest to us later by other functions.

In contrast, classical software, such as SPSS or Stata, displays immediately the results of an analysis. For example, if we run a series of 20 regressions in Stata, it opens 20 result windows. 2

2.2 Installing R

R is available for Linux, Mac OS X, and Windows operating systems. To download R, go to the CRAN project and then choose the version for your platform. CRAN stands for Comprehensive R Archive Network. It is a place to distribute R and R packages, and consists of mirror servers around the world.

If we use the cloud mirror to download R, it’s going to automatically pick a mirror for us and we don’t have to worry about picking a mirror.

After we download R and install it on our computer, we can run R from the terminal if we are on a Mac or Linux system. By typing “R” at a prompt, we will open the R console. On Windows or Mac, R graphical user interface is also available to us, which is a more convenient way to use R.

2.3 Using RStudio

While it is perfectly possible to use R in this way, for serious coding we’ll want to use a more powerful text editor. We’ll get the best experience of R by using an Integrated Development Environment (IDE). RStudio is an IDE developed by the company Posit (formerlly RStudio) specifically for R and makes coding in R easier. We can go to the RStudio website to download its free desktop version.

Note that we must have R installed on our computer before we can install RStudio, because RStudio is going to use the version of R on our computer.

One of the most popular alternatives to RStudio is Visual Studio Code (VS Code) with the R extension enabled.

IDE

Generally, an IDE is a software application that provides common tools for software development in a single graphical user interface. For instance, RStudio comes with a source code editor, and features including syntax highlighting, auto-completion, and debugging. It provides integrated space where we can write R scripts, review objects, view command history, browse data frames, consult help documentations etc.

RStudio interface and features

The RStudio user interface has 4 primary panes: Source pane,Console pane, Environment pane (containing the Environment, History, Connections, Build, and Tutorial tabs), and Output pane (containing the Files, Plots, Packages, Help, Viewer, and Presentation tabs). 3

The console is the place where we interactively work with our commands and view the outputs.

> is the default R prompt at which we type the commands in the console. It is where we enter code.

To start coding, we open a script in the source pane. What’s nice about RStudio is that it has integrated other languages such as Python, SQL, Stan, D3, etc. We can also create an R Markdown or Quarto to generate reproducible documents, or build interactive web applications with Shiny Web App in RStudio.

The Environment tab is where we can see our workspace with the objects we have created. The collection of objects currently stored in the memory is called the workspace. We’ll come back to the concepts of objects and workspace later.

The History tab allows us to browse the command history in the current session. The Connection tab is where we can connect to local or remote databases.

If we have generated images, we can view these plots in the Plots tab. The Packages tab is the place where we view the installed packages, or install and update packages. The Help tab is where we can consult package documentation and vignettes.

RStudio keyboard shortcuts

We can navigate the RStudio user interface more productively with the shortcuts. The RStudio Keyboard Shortcuts page has the complete reference.

One frequently used trick is using the Tab key for automatic completion of code, including using it to activate a snippet of code to be filled interactively.

2.4 How we talk to R

Now we have R installed on our computer and RStudio open. How do we talk to R?

command, code, program

First, we send a sequence of instructions to R to specify how to perform a computation. These instructions are commands, codes, or programs that we use to talk to computers. In the end, instructions are collection of bits that the computer understands and obeys. 4

We type these R commands to send instructions from the keyboard. Keyboard is an input device. Input feeds information to the computer. Other input instructions can include getting data from a file, for instance.

In this step, we translate our natural language to R commands, so R understands what we want.

memory, storage

Next, the computer needs to figure out what R is doing. R commands are further translated to machine language that the computer understands; they are sent to the CPU for processing. Central Processing Unit (CPU) is the active part of the computer that takes instructions from us and runs the program.

The results are displayed directly on the screen. Screen is an output device. Output is the result of computation sent to the user. Other output instructions can include saving data in a file, or sending data over a network.

What we create in an R session (i.e. objects) will be stored in the memory of the computer. Memory is the temporary storage area where programs are kept when they are running; it also contains the data needed by the running program.

When we exit from R, we can save what we have created (objects), currently held in the computer’s main memory, to permanent storage, such as a hard drive.

The instructions can also be stored in a script (a file with suffix “.R”), saved in permanent storage. Next time when we run an R script, it will be loaded in the main memory.


  1. W. N. Venables, D. M. Smith and the R Core Team. (2022). An Introduction to R.↩︎

  2. If you would like to see these differences, check DATA ANALYSIS EXAMPLES compiled by UCLA Statistical Consulting Group and compare the model outputs from Stata, SAS, SPSS, Mplus and R.↩︎

  3. See RStudio User Guide.↩︎

  4. For more on the organization of a computer and its components, read the book Computer Organization and Design: The Hardware/Software Interface (5th ed.) by Patterson, D. A., & Hennessy, J. L. (2014). For what is relevant to this section, refer to p14-19 of this book.↩︎