4 Introduction to R

4.1 Objectives

At the end of this tutorial, you will have:

  • An understanding of and ability to use mathematical and assignment operators in R

  • The ability to use basic statistical functions in R

4.2 Setup

This chunk of R code loads the packages that we will be using.

4.3 R as a calculator

In the code chunk below, type 10 + 4, then press the green “Run Current Chunk” button:

# solution
10 + 4
## [1] 14

(The [1] is telling us that the “14”, the result of our addition, is the first element in our vector.)

4.3.1 Reference: other operators

In addition to these functions, R has a number of operators that add significantly to your code.

4.3.2 Reference: Arithmetic operators

R uses the same arithmetic operators that other programs use—you’re probably familiar with all of them:

Operator Description
+ plus
- minus
* multiplication
/ division
^ exponential

4.3.3 Your Turn A-1: math

Try some other math in the code chunk below.

Notes:

  • You can put multiple equations, each separate math equation needs to be on its own line.

  • You can leave lines blank and spaces between numbers and operators—some white space will make your code easier to read.

  • If you want to add an explanation about what your code is doing, you can add a comment that won’t be “interpreted” as instructions by starting the line with a # symbol

# This is my math experiment

4.4 The power of named variables

In R, as in other programming languages, we can create named objects, and those objects can used in subsequent calculations.

4.4.1 the assignment arrow

In R, we use an assignment arrow to create a named object.

It looks like this: <-

It is a combination of a “less than” sign, followed by a hyphen.

You can read it as:

object_name <- value

or even:

new_thing <- old_thing

In the example below, we will create an object “x” with the value 10. The object is then part of the R environment, and can be used later.

We can show the value of x in our R markdown output by putting the name in the code, as shown below.

x <- 10

x
## [1] 10

To show x in the console, we type x.

4.4.2 Your Turn: naming objects

Create a new object “y” with the value 4, and print it.

# answer
y <- 4

y
## [1] 4

In the chunk below two objects are created and added together.

x <- 10

y <- 4

x + y
## [1] 14

How would you set x equal to 20?

One solution:

x <- 20

Another solution: since we earlier assigned x a value of 10

x <- x * 2

Try the second solution in the chunk below:

# answer
x <- 10

x <- x * 2

x
## [1] 20

4.4.3 Your Turn: more math

Create an object “x” with the value of 12, an object “y” with the value of 3, and an object “z” with a value of 7.

Then divide “x” by “y”, and “x” by “z”.

Once you have your code written, run the chunk.

# answer
x <- 12
y <- 3
z <- 7

x/y
## [1] 4
# The result is 4

x/z
## [1] 1.714286
# The result is 1.714286

4.5 Strings

In R, character strings are created by putting letters, numbers, and other symbols inside quotation marks:

t <- "If it's inside quotation marks, it's a character"

Let’s print the object hello_world:

hello_world <- "Hello, world"

Note that if we put a number inside quotation marks, it’s now a character string—without quotation marks, it is a number. If we try to use a character string that looks like a number in a math equation, R throws an error:

a_number <- 10

not_a_number <- "10"
a_number + not_a_number

4.6 Lists & functions

The earlier examples have a single value in each object. Most data analysis is the summarization of multiple values—means, medians, correlations, and advanced machine learning methods are all approaches that take multiple values as inputs.

In R, a function’s syntax looks like this:

function_name(variable, perhaps_an_argument)

In R, one way to create one type of list (technically, an atomic vector) of numbers is using the c() function (“c” for “combine”). Here’s the code to create a list of numbers:

# setup chunk with bida_numbers
bida_numbers <- c(301, 302, 404, 405, 406)
bida_numbers <- c(301, 302, 404, 405, 406)

bida_numbers
## [1] 301 302 404 405 406

We can then apply a function to the object bida_numbers. For example, to calculate the mean, we use the function mean():

mean(bida_numbers)
## [1] 363.6

4.6.1 Reference: Summary functions

Some basic statistical functions

Two-variable statistical functions

Function Description
min()
max()
minimum and maximum values
mean()
median()
mode()
measures of “average”
sum() adds up the values
sd() calculates the standard deviation
var() calculates the variance
length() number of cases
summary() a list of measures of central tendency

Because R is a statistical/data science programming language, these and many more functions are built in to base R.

4.6.2 Your Turn: functions

Write the script to calculate:

  • the minimum value of bida_numbers

  • how many numbers there are in bida_numbers

# answer

min(bida_numbers)
## [1] 301
length(bida_numbers)
## [1] 5

4.6.3 Your Turn: summary()

The summary() function is handy for getting a quick snapshot of a variable.

Write the script to generate the summary of bida_numbers.

# answer

summary(bida_numbers)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   301.0   302.0   404.0   363.6   405.0   406.0

4.7 Data frames

So far we’ve only looked at objects that have one item in them. But most of your data analysis is going to use a data frame (also known as a data table) of some sort think of a typical spreadsheet of numbers, with the data in columns and rows.

We will delve into what makes a good data frame structure later in this course, but for now let’s practice with some small examples so you can see what’s happening.

In this example, we have a data frame called “anscombe_1” with 2 variables, and each variable has 11 cases in each. Let’s take a quick look so we get a sense of the structure:

anscombe_1
##    x1    y1
## 1  10  8.04
## 2   8  6.95
## 3  13  7.58
## 4   9  8.81
## 5  11  8.33
## 6  14  9.96
## 7   6  7.24
## 8   4  4.26
## 9  12 10.84
## 10  7  4.82
## 11  5  5.68

We can calculate the mean value of the variable x1 using the following code. The elements in the code are:

function(dataframe$variable)

Note that the dollar sign separates the data frame name from the variable in that data frame. (Later in the course we will see other ways to write the same code.)

mean(anscombe_1$x1)
## [1] 9

4.7.1 Your Turn: functions on data frames

Using the anscombe_1 data frame, write the script to calculate:

  • the minimum value of x1

  • the median value of y1

# answer

min(anscombe_1$x1)
## [1] 4
# answer: 4

median(anscombe_1$y1)
## [1] 7.58
# answer: 7.58

4.7.2 Some two-variable statistical functions

Earlier we saw functions that calculate summary statistics on a single variable, such as mean() and sd(). R also gives us functions to calculate the statistics on two variables, such as the correlation and the covariance.

Two-variable statistical functions

Function Description
cor(x, y) calculate the correlation between x and y
cov(x, y) calculate the covariance between x and y

And again, the statistical/data science foundation of R means that these and many more functions are built in to base R.

4.7.3 Your Turn: functions on two variables

Using the anscombe_1 data frame, write the script to calculate:

  • the correlation between x1 and y1
# answer

cor(anscombe_1$x1, anscombe_1$y1)
## [1] 0.8164205
# answer: 0.8164205

-30-