# Chapter 3 Module 1 - introduction to R

In it’s simplest form R can be used as a calculator with `+`, `-`, `/` or `*`.

``100 + 4``
``##  104``

Or

``4 * 6 - 2``
``##  22``

Create objects with `<-`, which is called the assign operator.

``````x <- 100 + 4
x``````
``##  104``

The assign operater `<-` can be reversed `->`

``````100 + 4 -> x
x``````
``##  104``

You can combine values, or objects in a new object with the function `c()` (c for combine). When objects are combined they are called a vector.

``````x <- c(4, 100 + 4, 10 * 2)
x``````
``##    4 104  20``

Objects and vectors are not restrained to numerical values, you can use text in them as well.

``````text <- c("hej", "jag", "älskar", "r")
text``````
``##  "hej"    "jag"    "älskar" "r"``

However, you cannot mix numerical and text values.

``````blandat <- c(1, 5, "hej", 6)
blandat``````
``##  "1"   "5"   "hej" "6"``

## 3.1 Missing values

`NA` is not zero. It is not a value.

``x <- c(4, NA, 2, 50)``

If check which values that are larger than two:

``x > 2 ``
``##   TRUE    NA FALSE  TRUE``

Let’s filter out all the `NA`’s:

``x == NA``
``##  NA NA NA NA``

Confusing?

``````fredriks_age <- NA
markus_age <- NA
fredriks_age == markus_age``````
``##  NA``

If we want to find an `NA` or filter out `NA`s we us `is.na()` instead.

``is.na(x)``
``##  FALSE  TRUE FALSE FALSE``

`na.rm` is a common argument in functions.

``mean(x)``
``##  NA``

We use `na.rm = TRUE`

``mean(x, na.rm = TRUE)``
``##  18.66667``

## 3.2 R is a functional programming languange

• Functions reside in packages
• Functional programming is great for Data Science

## 3.3 Functions

Just like in Excel

• mean()
• median()
• sd()
• …and so on

And mathematical

• log()
• sin()
• cos()
• …osv

## 3.4 Documentation

To access documentation about functions, i.e. how they work, you just add a question mark in front of the function that you are interested in.

``?mean()``

## 3.5 Excercices

• Use some of R’s statistical functions on a numerical vector

## 3.6 data.frame

data.frames are a common format when doing data science in R. A data.frame is a rectangular table with one or more columns.

``````## # A tibble: 6 x 19
##    year month   day dep_time sched_dep_time dep_delay arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>
## 1  2013     1     1      517            515         2      830
## 2  2013     1     1      533            529         4      850
## 3  2013     1     1      542            540         2      923
## 4  2013     1     1      544            545        -1     1004
## 5  2013     1     1      554            600        -6      812
## 6  2013     1     1      554            558        -4      740
## # … with 12 more variables: sched_arr_time <int>, arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## #   time_hour <dttm>``````

We can create our own data frames in R.

``data.frame(random_number = rnorm(5))``
``````##   random_number
## 1   -0.07509414
## 2    0.69395273
## 3    0.65592953
## 4    0.35942863
## 5   -0.33709797``````

If you have two vectors of the same lenght you can combine them to a data.frame.

``````siffror <- c(5,1,2,5)
ord <- c("vad", "var", "det", "där")

data.frame(siffror, ord)``````
``````##   siffror ord
## 2       1 var
## 3       2 det
## 4       5 där``````

## 3.7 Packages

• To install a package from `CRAN` you use the function `install.packages("package")`.

• After downloading a package your need to load it with `library(package)`.

## 3.8 Excercise

The package `tidyverse` is downloaded for you. Load it with `library()`.

## 3.9 tidyverse and friends

• tidyverse is a collection of packages for common tasks in data analysis.

• They share a common philosophy

• Easy to use

• We will focus on tidyverse

### 3.9.1 Workflow in R

• Use projects ### 3.9.2 Writing code in R

• Follow the `tidyverse styleguide`

Name objects, functions and data.frames with small letters and *_* between words.

``min_egna_funktion <- function(x)``

In contrast to:

``MinEgnaFunktion <- function(x)``
• You are writing text for someone to read it
• Use space between `,`

GOOD:

``mean(x, na.rm = TRUE)``

``mean(x,na.rm=TRUE)``

## 3.10 When saving files

When saving files we try to follow this principle, so when you name a file name it `min_r_fil.R` instead of `min R fil.R`.

## 3.11 Avoid long expressions

``iris %>% group_by(Species) %>% summarise(Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width), Species = n_distinct(Species))``

Than this:

``````iris %>%
group_by(Species) %>%
summarise(
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width),
Species = n_distinct(Species)
) ``````

### 3.11.1 Rmarkdown

• A notebook format in R
• Great for creating reports
• Great for exploratory analysis

• Open up `intro-to-r.Rmd`