2 A Brief Introduction to R and RStudio

R (R Core Team 2019) is a free software environment for data manipulation, statistical analyses, and graphics. It is easily extended through add-on software packages including some for commonly used fisheries models. Our interface for working in R will be the free software RStudio, which can be run in either the Windows, Macintosh, or Unix environment. There are many online resources for installing and learning R and RStudio, and even a recent book for fisheries analysis using R (Ogle 2015). We assume in this book that R and RStudio have been installed. We get started by examining a few basic R commands but will primarily illustrate the use of R through code examples in the remaining chapters. We will also describe some general methods for getting help and learning more about R.

As a first example using R, let’s estimate population size through a two-sample mark-recapture study. Consider a pond with N=100 fish. If you take a first sample (n1=50) and mark (tag) those fish, then half the fish in the pond are tagged. If you take a second sample (say n2=30), how many tagged fish (m2) would you expect in the second sample? That’s right, 15! Of course, in practice, we don’t know N but we solve for it. We begin by setting the two proportions equal: \(n_1/N=m_2/n_2\). Solving for N gives us our estimator (Nhat): \(\hat{N} = \frac{n_1*n_2}{m_2}\)

We can carry out the calculations using R code as follows. First, open a new RStudio window (menu commands File : New File : R Script). This Source window will contain your R script, or the sequence of R commands you intend to execute. Enter (or copy and paste) the following code into the Source window:

setwd("Q:/My Drive/FishAnalysis") # Edit to show your working directory

# Two-sample population estimate
n1 <- 26 # Number caught and marked in first sample
n2 <- 14 # Number caught and examined for marks in second sample
m2 <- 7 # Number of marked fish in second sample
N_hat <- n1*n2/m2 # Store estimate as variable N_hat
N_hat # Prints N_hat in Console window

Any text following a pound sign (#) is a comment, whether on a line by itself or following executable R code. Comments do not affect the program’s execution, but are extremely helpful in documenting your code. Taking the time to enter comments is always worthwhile, whether for your own information or when code is to be shared with others.

Next, position the cursor on the first line and execute one line at a time by clicking on the Run icon. This allows you to see the effect of each statement. The first line calls the setwd() function to establish the working directory for any files saved. Doing this routinely will ensure that files end up in the proper location. You can find out your working directory path by using the RStudio menu command (Session | Set Working Directory | Choose Directory). That allows you to navigate to the correct location, and the setwd() code will be printed to the Console. You can then save that line of code for future use.

The next three executable lines assign (<- ) values to variables, which have arbitrary names. Here, I have used conventional fisheries mark-recapture names for the sample sizes and number of recaptures, but more descriptive names could be used (e.g., Mark_Sample) for clarity. Each variable is a new object that will appear in the Environment window once that line of code has been run. The next line of code calculates the estimate, which is stored as N_hat. The final line prints the value of N_hat in the Console window. The population estimate (52) makes sense because you marked 26 and half the fish in the second sample were marked.

After you have run this code in RStudio, be sure to save the file. This is a good general practice – never create code from scratch when you can modify existing code! It cuts down on typing errors and allows you to build on existing code when doing new but similar things. The code could be from your prior work or from a journal article or book. The ability to share R and JAGS code is very helpful in learning and working efficiently. Saving the file to a working directory specified in the code helps in keeping together all files related to a specific project. Use a descriptive name for the code (R script) so that it can be readily located in the future.

Much of the work done in R is accomplished by using functions, which are a set of statements that perform a specific task. Typing the name of an R function into the search box in the RStudio Help window is one quick way to get help (e.g., function defaults, arguments, examples). However, the R documentation is sometimes overly detailed and cryptic, so another good option is an online search (e.g., “R function max” or “how to load csv file R”). Another helpful alternative is the R Reference Card, which provides a brief summary of some of the more frequently used R commands. For example, the reference card lists a number of math functions such as log(x, base), which returns the log of x to the specified base. The statement log(x) returns the default natural-log base. This is typical of R, in that functions have default arguments that need not be specified. (It can also be a bit dangerous if the default action is not what you intended!) Another important aspect of R is that the object returned by the log() function depends on the argument. Consider the following four lines of code:

x <- 2.72
log(x)
x <- c(4, 7)
log(x)

We can test the code in a new R script window. If you have worked through this section in a single session, then objects from the mark-recapture example will be in the Environment window. Start with a clean slate by clicking on the broom icon in the Environment window. Like most things in R, this can also be accomplished through menu commands (Session : Clear Workspace) or R code rm(list=ls()). The first R statement assigns the value 2.72 to x, and the second statement prints the natural log of 2.72 in the Console window. The third line uses the c() (combine) function to create a vector of length two. Because we used the same variable name to store the result, R replaces the original integer value of x with a vector containing two integers. (It is done here as an example, but in the future, we will avoid this bad practice of reusing a variable name for a different purpose!) In the final line of code, the log() function returns a vector by applying the log() function element-by-element. This flexibility (allowing x to be defined first as a scalar then vector, and allowing either type as an argument to the log() function) is a strength of R, but it also increases the risk of coding errors compared to programming languages that are more structured. The keys for success in R programming are to check and test code carefully and to reuse existing code whenever possible. The chapters that follow will provide many more examples of R code, with comments and explanation as needed as new features are introduced. As you encounter R code, you should always run it to make sure that you understand how it works and its purpose.

References

Ogle, D. H. 2015. Introductory Fisheries Analyses with R, 1st edition. Chapman and Hall/CRC, Boca Raton, Florida.
R Core Team. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.