4 Introduction to R
4.1 Objectives
At the end of this tutorial, you will have:
An understanding of and ability to use mathematical and assignment operators in R
The ability to use basic statistical functions in R
4.3 R as a calculator
In the code chunk below, type 10 + 4
, then press the green “Run Current Chunk” button:
# solution
10 + 4
## [1] 14
(The [1]
is telling us that the “14”, the result of our addition, is the first element in our vector.)
4.3.1 Reference: other operators
In addition to these functions, R has a number of operators that add significantly to your code.
4.3.2 Reference: Arithmetic operators
R uses the same arithmetic operators that other programs use—you’re probably familiar with all of them:
Operator | Description |
---|---|
+ | plus |
- | minus |
* | multiplication |
/ | division |
^ | exponential |
4.3.3 Your Turn A-1: math
Try some other math in the code chunk below.
Notes:
You can put multiple equations, each separate math equation needs to be on its own line.
You can leave lines blank and spaces between numbers and operators—some white space will make your code easier to read.
If you want to add an explanation about what your code is doing, you can add a comment that won’t be “interpreted” as instructions by starting the line with a
#
symbol
# This is my math experiment
4.4 The power of named variables
In R, as in other programming languages, we can create named objects, and those objects can used in subsequent calculations.
4.4.1 the assignment arrow
In R, we use an assignment arrow to create a named object.
It looks like this: <-
It is a combination of a “less than” sign, followed by a hyphen.
You can read it as:
object_name <- value
or even:
new_thing <- old_thing
In the example below, we will create an object “x” with the value 10. The object is then part of the R environment, and can be used later.
We can show the value of x
in our R markdown output by putting the name in the code, as shown below.
x <- 10
x
## [1] 10
To show x
in the console, we type x
.
4.4.2 Your Turn: naming objects
Create a new object “y” with the value 4, and print it.
# answer
y <- 4
y
## [1] 4
In the chunk below two objects are created and added together.
x <- 10
y <- 4
x + y
## [1] 14
How would you set x
equal to 20?
One solution:
x <- 20
Another solution: since we earlier assigned x
a value of 10
x <- x * 2
Try the second solution in the chunk below:
# answer
x <- 10
x <- x * 2
x
## [1] 20
4.4.3 Your Turn: more math
Create an object “x” with the value of 12, an object “y” with the value of 3, and an object “z” with a value of 7.
Then divide “x” by “y”, and “x” by “z”.
Once you have your code written, run the chunk.
# answer
x <- 12
y <- 3
z <- 7
x/y
## [1] 4
# The result is 4
x/z
## [1] 1.714286
# The result is 1.714286
4.5 Strings
In R, character strings are created by putting letters, numbers, and other symbols inside quotation marks:
t <- "If it's inside quotation marks, it's a character"
Let’s print the object hello_world
:
hello_world <- "Hello, world"
Note that if we put a number inside quotation marks, it’s now a character string—without quotation marks, it is a number. If we try to use a character string that looks like a number in a math equation, R throws an error:
a_number <- 10
not_a_number <- "10"
a_number + not_a_number
4.6 Lists & functions
The earlier examples have a single value in each object. Most data analysis is the summarization of multiple values—means, medians, correlations, and advanced machine learning methods are all approaches that take multiple values as inputs.
In R, a function’s syntax looks like this:
function_name(variable, perhaps_an_argument)
In R, one way to create one type of list (technically, an atomic vector) of numbers is using the c()
function (“c” for “combine”). Here’s the code to create a list of numbers:
# setup chunk with bida_numbers
bida_numbers <- c(301, 302, 404, 405, 406)
bida_numbers <- c(301, 302, 404, 405, 406)
bida_numbers
## [1] 301 302 404 405 406
We can then apply a function to the object bida_numbers
. For example, to calculate the mean, we use the function mean()
:
mean(bida_numbers)
## [1] 363.6
4.6.1 Reference: Summary functions
Some basic statistical functions
Two-variable statistical functions
Function | Description |
---|---|
min() max()
|
minimum and maximum values |
mean() median() mode()
|
measures of “average” |
sum() |
adds up the values |
sd() |
calculates the standard deviation |
var() |
calculates the variance |
length() |
number of cases |
summary() |
a list of measures of central tendency |
Because R is a statistical/data science programming language, these and many more functions are built in to base R.
4.7 Data frames
So far we’ve only looked at objects that have one item in them. But most of your data analysis is going to use a data frame (also known as a data table) of some sort think of a typical spreadsheet of numbers, with the data in columns and rows.
We will delve into what makes a good data frame structure later in this course, but for now let’s practice with some small examples so you can see what’s happening.
In this example, we have a data frame called “anscombe_1” with 2 variables, and each variable has 11 cases in each. Let’s take a quick look so we get a sense of the structure:
anscombe_1
## x1 y1
## 1 10 8.04
## 2 8 6.95
## 3 13 7.58
## 4 9 8.81
## 5 11 8.33
## 6 14 9.96
## 7 6 7.24
## 8 4 4.26
## 9 12 10.84
## 10 7 4.82
## 11 5 5.68
We can calculate the mean value of the variable x1
using the following code. The elements in the code are:
function(dataframe$variable)
Note that the dollar sign separates the data frame name from the variable in that data frame. (Later in the course we will see other ways to write the same code.)
mean(anscombe_1$x1)
## [1] 9
4.7.1 Your Turn: functions on data frames
Using the anscombe_1
data frame, write the script to calculate:
the minimum value of
x1
the median value of
y1
# answer
min(anscombe_1$x1)
## [1] 4
# answer: 4
median(anscombe_1$y1)
## [1] 7.58
# answer: 7.58
4.7.2 Some two-variable statistical functions
Earlier we saw functions that calculate summary statistics on a single variable, such as mean()
and sd()
. R also gives us functions to calculate the statistics on two variables, such as the correlation and the covariance.
Two-variable statistical functions
Function | Description |
---|---|
cor(x, y) |
calculate the correlation between x and y |
cov(x, y) |
calculate the covariance between x and y |
And again, the statistical/data science foundation of R means that these and many more functions are built in to base R.
4.7.3 Your Turn: functions on two variables
Using the anscombe_1
data frame, write the script to calculate:
- the correlation between
x1
andy1
# answer
cor(anscombe_1$x1, anscombe_1$y1)
## [1] 0.8164205
# answer: 0.8164205
-30-