3 Getting Started with R

3.1 The set-up

We assume you have downloaded R and RStudio on your machine. Before you start, I want you to create a folder where you will keep all your R related work. Make a note of exactly where that folder is and what it is called. Mine is at C:/Users/Karin/Desktop/R_folder. This is your working directory.

Next, click on the round blue RStudio icon and start RStudio. Your screen should look like this:

If it doesn’t, click on the icons in the upper right corners, the ones circled in red, and play around until it looks right.

There are four parts to your screen:

  • Top left (1) - the script or source window. This is where you type commands, create and edit R scripts, and basically do all your work. You can go back to previous lines and edit anything you wrote earlier.

  • Bottom left (2) - the console window. Think of this window as a log. You can also type and execute commands here, but it is hard to go back and edit them. Any error messages will show up here. To clear the console, click on the little broom icon.

  • Top right (3) - the environment and history window. For now, we use this window to keep track of any variables assigned. To clear the environment, click on the little broom icon. There are also tutorials on special topics you can run.

  • Bottom right (4) - files, plots etc. This is where we can find our files, look up help, display plots, download extra packages, etc. If you click on Help, You can get help on any topic or command.

First, we need to make sure you are in working in your working directory. To check, type ‘getwd()’ in the first line in window 1 and press Ctrl+Enter or command+return if you use a Mac. You should see the result in window2. If that line shows your working directory, great. If not, use setwd( path to your directory ). Here is what that looked like for me:

Finally, we adjust window (4) to show the files in the working directory, Click on the cog wheel, then on Go To Working Directory.

3.2 A word of caution

R is a living language, and as such it changes. As you run a script that you wrote earlier, you may get the message xxx has been deprecated, use yyy instead. Deprecated in general means the function still works, but is outdated and should no longer be used. A deprecated function may or may not work, you should replace it.
Similarly, the reference sheets included in the last chapter may have been updated, better check that to.
And finally, because there are so many contributors to R, there are usually many different ways to answer a question / write a script. There is no right way.

3.3 First calculations

Let’s start with some basic arithmetic, using R like a calculator. In the top left window, type

2*3
#> [1] 6

(If you do not see that [1] 6 line, did you remember Ctrl+Enter or command+return?) The [1] simply means that this is he first line of output for this command.

You can assign the result of an operation to a variable with the assignment operator:

x <- sqrt(42)

This does not display the result. However, if you check in the environment window, you see that you now “own” a variable x with the value of 6.48074069840786. To display x in the script and console windows, simply type x into the command line and ctrl enter.This will display the value of x.

x
#> [1] 6.480741

Some of the basic operations are

3-2
#> [1] 1
4+6
#> [1] 10
4/8
#> [1] 0.5
3^4          # basic exponents
#> [1] 81
exp(1)       # Euler's constant to the power of 1
#> [1] 2.718282
log10(10)    # log base 10
#> [1] 1
log(exp(2))  # natural log
#> [1] 2
log(125,5)   # for bases other than 10 or e, indicate the base. This is log base 5 of 125
#> [1] 3
sin(90)      # sine function, uses radians!! Similar for all other trig functions
#> [1] 0.8939967
sin(pi/4)    # You can use pi to mean the constant pi
#> [1] 0.7071068
sinpi(1/4)   # Same as sin(pi/4)
#> [1] 0.7071068

In general, functions “make sense”, and just like in Excel, once you start typing a function, a window will pop up with some syntax hints.

Be careful when using undefined expressions (not a good idea anyway) or very large numbers. Have a look at these calculations:

0^0
#> [1] 1
1/0
#> [1] Inf
10^310
#> [1] Inf
10^-350
#> [1] 0

You can also find help on the usage of a command by typing in ?command. For example, ?sin will open a window bottom right with information on the correct usage of trigonometric functions.

3.4 Variables and data types

We can define variables and assign them values with the assignment operator <-. A valid variable name starts with a letter or . (do not follow the . with a number), contains only letters, numbers, ., or_. Caution: R is case sensitive. It is good practice to use names that are meaningful in the context of the work. Here, I am just calling my variable x.

x <- 3-7

As you define a variable, you will see it listed in the environment window (3). Typing a variable’s name will also give you the variable’s value.

x
#> [1] -4

R recognizes the following six data types:

  • Numeric - a real number with or without decimal values
  • integer
  • complex - a complex number represented as a + bi
  • character
  • logical - a variable taking on only True-False values
  • raw - a variable using hex numbers / ASCII instead of characters. For example, Vorwerk would be stored as 56 6F 72 77 65 72 6B. We won’t be using this type, so no worries.

The is.*** commands check if a variable is of a certain type.

The as.*** commands convert a variable to a certain type.

Below this is demonstrated using a test variable.

test_variable <- 1
is.numeric(test_variable)
#> [1] TRUE
is.integer(test_variable)
#> [1] FALSE
is.character(test_variable)
#> [1] FALSE
is.logical(test_variable)
#> [1] FALSE

as.numeric(test_variable)
#> [1] 1
str(test_variable)
#>  num 1
as.integer(test_variable)
#> [1] 1
str(test_variable)
#>  num 1
as.character(test_variable)
#> [1] "1"
str(test_variable)
#>  num 1
as.logical(test_variable)
#> [1] TRUE
str(test_variable)
#>  num 1

3.5 Sequences and strings

R allows us to easily create, store,and manipulate sequences. Here I am creating three different sequences and storing them in the variables seq1, seq2, and seq3. You will see them in your environment.

seq1 <- 0:9
seq2 <- seq(0,9,1)
seq3 <- seq(0,9,0.5)

To see the sequences in your console, you need to type in their names using either just their name or the print command. There is a difference between print(seq1), which gives you the value of seq1, and print("seq1"), which just typesseq1`.

seq1
#>  [1] 0 1 2 3 4 5 6 7 8 9
print(seq1)
#>  [1] 0 1 2 3 4 5 6 7 8 9
print("seq1")
#> [1] "seq1"

You can also assign strings to a variable with the c() operator, or by specifying how often something is to be repeated:

str1 <- c(0,1,2,3,4,5,6,7,8,9)
str2 <- c(0,2,5,4,3,1,8,7,9,1)
str3 <- c("1", 2, "three", 4, 9, "five")
str4 <- c(1,1.5,6)
str5 <- c(TRUE,TRUE,FALSE)
str6 <- c("TRUE","TRUE","FALSE")
str7 <- rep(1,10)
str8 <- rep(c(1,2,3),5)
str9 <- rep(c("Hi",1," :) "),c(1,2,3))
str9
#> [1] "Hi"   "1"    "1"    " :) " " :) " " :) "
str10 <- rep(c("Hi",1," :) "),3)
str10
#> [1] "Hi"   "1"    " :) " "Hi"   "1"    " :) " "Hi"   "1"   
#> [9] " :) "
str11 <- rep(c("Hi",1," :) "),c(3,3,3))
str11
#> [1] "Hi"   "Hi"   "Hi"   "1"    "1"    "1"    " :) " " :) "
#> [9] " :) "

To access one or more elements of a string or sequence, simply specify which ones you want inside square brackets [].

str5[2]
#> [1] TRUE
str10[c(1,3,6)]
#> [1] "Hi"   " :) " " :) "
str10[1:5]
#> [1] "Hi"   "1"    " :) " "Hi"   "1"

All elements of a sequence are of the same type. If you try to assign several different data types to the same variable, R will convert some to a common type. You see that str3 is a character string and str4 is numeric, even though you tried to assign them both integers. Also have a look at the difference between str5 and str6.

You can always get information about a variable by using the str command:

str(str1)
#>  num [1:10] 0 1 2 3 4 5 6 7 8 9
is.character(str1)
#> [1] FALSE
str1 <- as.character(str1)
is.character(str1)
#> [1] TRUE

You can do computations with strings and sequences, as long as they have the same length. Computation are done element wise, as shown in the examples below.

x <- seq(1,10,0.5)
y <- rep(2,19)
x+y
#>  [1]  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
#> [12]  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0
x*y
#>  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
#> [19] 20
x^y
#>  [1]   1.00   2.25   4.00   6.25   9.00  12.25  16.00  20.25
#>  [9]  25.00  30.25  36.00  42.25  49.00  56.25  64.00  72.25
#> [17]  81.00  90.25 100.00
sin(x)
#>  [1]  0.84147098  0.99749499  0.90929743  0.59847214
#>  [5]  0.14112001 -0.35078323 -0.75680250 -0.97753012
#>  [9] -0.95892427 -0.70554033 -0.27941550  0.21511999
#> [13]  0.65698660  0.93799998  0.98935825  0.79848711
#> [17]  0.41211849 -0.07515112 -0.54402111
z <- 2^(1:10)

This code chunk uses == to check if an entry of x is the same as a corresponding entrance in y. Note that the result is of type logical.

same <- x == y
str(same)
#>  logi [1:19] FALSE FALSE TRUE FALSE FALSE FALSE ...
more <- x > y

Logical variables are interpreted as 0 or 1 even if they are stored as logical and can be used in arithmetic. However, the type of the result may change as you get values other than 0 and 1.

more+same
#>  [1] 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
str(more)
#>  logi [1:19] FALSE FALSE FALSE TRUE TRUE TRUE ...
more <- more + 1
str(more)
#>  num [1:19] 1 1 1 2 2 2 2 2 2 2 ...

3.6 Data frames

A data frame is a matrix-like structure that organize data into rows and columns. Each column contains all the values of one variable. The columns are of the same length, but can be different types. The column names (the variable names) should be non-empty. Each row represents one observation of the variables.

Example: create a data frame with the initials, ages, and number of siblings for 5 people.

names <- c("CJ","MM","GK","RT","LV")
ages <- c(23,24,21,20,20)
siblings <- c(1,0,0,2,3)
people <- data.frame(names, ages, siblings)
str(people)
#> 'data.frame':    5 obs. of  3 variables:
#>  $ names   : chr  "CJ" "MM" "GK" "RT" ...
#>  $ ages    : num  23 24 21 20 20
#>  $ siblings: num  1 0 0 2 3
people
#>   names ages siblings
#> 1    CJ   23        1
#> 2    MM   24        0
#> 3    GK   21        0
#> 4    RT   20        2
#> 5    LV   20        3

Note that your data frame shows up in the environment, window (3).

Entries in a data frame can accessed by giving the row and column numbers:

people[1,3]
#> [1] 1
people[1,]
#>   names ages siblings
#> 1    CJ   23        1
people[,2]
#> [1] 23 24 21 20 20
people[1:3, c(1,3)]
#>   names siblings
#> 1    CJ        1
#> 2    MM        0
#> 3    GK        0

If you prefer, you can also use the column names. This is oftentimes easier, especially when you have large data frames. Compare the following commands:

people[,1]
#> [1] "CJ" "MM" "GK" "RT" "LV"
people$names
#> [1] "CJ" "MM" "GK" "RT" "LV"
people[2,1]
#> [1] "MM"
people$names[2]
#> [1] "MM"

You can add compute new columns and add them to the data frame.

only_child <- siblings==0
people <- data.frame(people, only_child)
people
#>   names ages siblings only_child
#> 1    CJ   23        1      FALSE
#> 2    MM   24        0       TRUE
#> 3    GK   21        0       TRUE
#> 4    RT   20        2      FALSE
#> 5    LV   20        3      FALSE

3.7 A word about packages

R packages are extensions of R containing code, sample data, and documentation that you can install. You can check which packages you have available by clicking on Packages in window 4. R installs a default set of packages during installation. We will add more packages as needed later. Packages are really easy to install, just click on Install (window 4), and type in the name of the package you want. To use a packages, just check the box next to it, or type library(package name) in a command line.

As of March 2023, there were over 19,000 packages.So, which packages should you use? Well, it depends. We will introduce some packages in this course, but in general, you will download packages that make your life easier. For example, if you need to produce a lot of high-quality graphics, you will need the ggplot2 package. If you only use graphs once in a while, or only for exploratory purposes, then base R is good enough. If you do a lot of hypothesis testing on small data sets, use the infer package, if you work a lot with dates use the lubridate package and so on.

3.8 Assignment

A. Run the commands below and explain what they are doing. output and what the command does.

  1. getwd()
  2. 0:9
  3. seq(0:9)
  4. seq(0,9,1)
  5. seq(0,9,2)
  6. 0:100
  7. dog1 <- 0:10
  8. dog1 <- dog1+3
  9. dog2 <- seq(0,10)
  10. dog2 <- dog2^2
  11. dog3 <- seq(0,20,2)
  12. dog4 <- seq(1,2,0.1)
  13. dog3+dog4
  14. dog1*dog2
  15. sum(dog1)
  16. dog5 <- seq(0,3,length=15)
  17. dog6 <- c(2,5,17,0,9)
  18. dog7 <- c(“a”,“b”,“c”)
  19. dog8 <- c(dog7, 1:5)
  20. dog9 <- rep(“a”,5)
  21. dog10 <- rep(c(“a”,1:4),5)
  22. dog11 <- rep(c(“a”,“b”,“c”),c(2,2,4))
  23. dog1[5]
  24. dog1[5:9]
  25. dog1[c(1,4,5,6,9)]
  26. dog1[-c(1,4,5,6,9)]
  27. dog12 <- c(2,2,.5,6,12,3,7,4,5,6,3,10)
  28. index <- which(dog12 < 6)
  29. dog12[index]
  30. plot(dog3,dog4)
  31. str(dog10)
  32. is.numeric(dog10)

B. Write code to produce the following outputs:

  1. 1,1,1,1,1,2,2,2,2,3,3,3,4,4,5
  2. 1,1,1,2,2,2,3,3,3,4,4,4,5,5,5
  3. “1”,”2”,”3”,”4”,”5”
  4. “1”,”1”,”1”,”2”,”2”,”3”
  5. “1”,”2”,”3”, “1”,”2”,”3”, “1”,”2”,”3”, “1”,”2”,”3”, “1”,”2”,”3”, “1”,”2”,”3”
  6. “1”, “1”, “1”, “1”, “1”, ”5”, ”5”, ”5”, ”5”, ”5”, ”3”, ”3”, ”3”, ”3” ,”3”
  7. 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0
  8. 1, 0, 1, 0, 1, 0
  9. Convert the output from 5. to a numeric sequence
  10. Convert the output from 7. to a character sequence
  11. Convert the output from 8 to a logical sequence

C. Create a data frame containing the name, breed, and size (xs,s,m,l,xl) of dogs you know. Call this data frame dogs_I_know. Then add a logical variable friendly.

3.9 Functions

Just like the functions you are familiar with, a function in R takes from zero to many inputs (called parameters) and produces an output. Many functions are built in, but you can also define your own. You must give your function a name. The “instructions” for your function are enclosed in {}. Here is an example for a function without any input parameters:

give_me_pi <- function() {
  print(paste("pi = ", pi))
}
give_me_pi()
#> [1] "pi =  3.14159265358979"

Here is a similar function with a parameter dec. You need to pass any parameter that does not have a default setting to the function. Here is an example fo a function without a default setting:

give_me_pi_digits <- function(dec) {
  print(paste("pi = ", round(pi, digits=dec)))
}
give_me_pi_digits(dec=4)
#> [1] "pi =  3.1416"
give_me_pi_digits(4)
#> [1] "pi =  3.1416"

And here is a very similar function with a default. Note the difference in the function definition and output.

give_me_pi_digits_default <- function(dec=8) {
  print(paste("pi = ", round(pi, digits=dec)))
}
give_me_pi_digits_default(dec=4)
#> [1] "pi =  3.1416"
give_me_pi_digits_default(4)
#> [1] "pi =  3.1416"
give_me_pi_digits_default()
#> [1] "pi =  3.14159265"

You can also make a function return values to you. Note that the order in which you enter the parameters matters, unless you explicitly specify which parameter has which value.

linear_eval <- function(m,x,b){
  y <- m*x+b
  return(y)
}
linear_eval(2,10,7)
#> [1] 27
linear_eval(m=2, x=10, b=7)
#> [1] 27
linear_eval(10,7,2)
#> [1] 72
linear_eval(x=10, b=7,m=2)
#> [1] 27

3.10 Assignment

  1. Write a function quad_eval that takes the parameters a, b, c, and x and returns \(ax^2+bx+c\). The default value for a should be 1.

  2. Write a function quad_form that takes the parameters a, b, c and returns a vector containing the solution(s) to \(ax^2+bx+c=0\).