# Section 3 The Basics of R

R is built around a few basic pieces - once you understand them, it’s easier to understand more complex commands, since everything is built from the same basic foundations.

In programming terms, we can refer to the basic pieces that make
up R as **data types**.

## 3.1 Basic data types

### 3.1.1 Numbers

The **numeric** data type allows you to work with numbers. R can
do all the basic operations you’d expect: addition, subtraction,
multiplication and division.

At the most basic level, you can use R as a calculator by doing
standard operations like `+`

, `-`

, `/`

(division), `*`

(multiplication),
`^`

(power) on numeric data:

`## [1] 2`

`## [1] 7.5`

`## [1] 64`

R also has an **integer** (whole number) data type. Integers (usually)
work exactly the same as **numeric** data, so you don’t need to worry
too much about the difference for now. Integers will automatically be
converted to the more general numeric format when needed:

`## [1] 2`

`## [1] 3.1`

`## [1] 2.5`

### 3.1.2 Characters (text)

The **character** data type allows you to store and manipulate
text. Character data is created by wrapping text in either single `'`

or
double `"`

quotes. In programming terms, we also refer to each chunk of text
as a **string**:

`## [1] "apple"`

```
# Note: this is still just one string. All the text, including
# the spaces, is contained in the same chunk of text
toupper("three bananas")
```

`## [1] "THREE BANANAS"`

`## [1] "car"`

`## [1] "forgot"`

### 3.1.3 Logical (True/False)

The **logical** data type is used to represent the True/False result
of a logical test or comparison. These are represented by the
special values of `TRUE`

and `FALSE`

(basically 1 and 0, with special labels
attached to them). To do logical comparisons, you can use syntax like:

`==`

: equals. Note that you need a double equal sign to compare values, a single equal sign does something different.

`## [1] FALSE`

`<`

,`>`

: less than, greater than

`## [1] TRUE`

`<=`

,`>=`

: less than or equal to, greater than or equal to

`## [1] TRUE`

`!=`

: not equal to

`## [1] TRUE`

`!`

: not, which reverses the result of another logical test:

`## [1] FALSE`

#### 3.1.3.1 Combining logicals: AND and OR

More complex logical tests can be conducted by combining multiple tests
with the **and** `&`

and **or** `|`

operators.

`&`

takes two logicals, e.g. `a & b`

, and returns `TRUE`

if both `a`

*and* `b`

are `TRUE`

, and `FALSE`

otherwise.

`## [1] TRUE`

`## [1] FALSE`

`a | b`

returns `TRUE`

if either `a`

*or* `b`

is `TRUE`

`## [1] TRUE`

`## [1] FALSE`

It’s best to wrap each individual test in parentheses `()`

to make the logic clear.

## 3.2 Converting between types

Occasionally your data will be read in from a file as the wrong type.
You might be able to fix this by changing the way you read in the file,
but otherwise you should **convert** the data to the type that makes
the most sense (you might have to clean up some invalid values first).

Functions like `as.character()`

, `as.numeric()`

and `as.logical()`

will
convert data to the relevant type. Probably the most common type conversion
you’ll have to do is when `numeric`

data gets treated as text and is stored
as `character`

. Numeric operations like addition won’t work until you fix
this:

`## Error in "1" + 1: non-numeric argument to binary operator`

`## [1] 2`

## 3.3 Variables: Storing Results

The results of calculations in R can be stored in **variables**: you
give a name to the results, and then when you want to look at, use
or change those results later, you access them using the same name.

You **assign** a value to a variable using either `=`

or `<-`

(these
are mostly equivalent, don’t worry too much about the difference), putting
the variable name on the left hand side and the value on the right.

`## [1] 22`

`## [1] TRUE`

```
# Changing a variable: this will overwrite the old value with the
# new one, the old value won't be available unless you've
# stored it somewhere else
scale_total = scale_total + 2
scale_total
```

`## [1] 24`

**When you assign a variable, you’re asking R to remember some data so you can
use it later.** Understanding that simple
principle will take you a long way in R programming.

The **expression** on the right hand side might be long and complex,
but as long as it’s valid R code that creates a single value,
all you’re doing is creating a single value and assigning it to
a name, just like in the simple examples above.

Variable names in R should start with a letter (`a-zA-Z`

), and
can contain letters, numbers, underscores `_`

and periods `.`

, so
`model3`

, `get.scores`

, `ANX_total`

are all valid variable names.

### 3.3.1 Copying Variables

If you assign an existing value to a new variable, it will create a copy. If you change one copy, the other will stay as it was:

`## [1] 3`

`## [1] 5`

This can be useful if you want to test out some changes to your data, or create multiple different subsets of the same data.

## 3.4 Vectors

All the data-types discussed above can be stored in **vectors**^{1}, which
are sequences with multiple elements of the same type. Vectors can be
created using the `c()`

function to put together multiple elements.

`## [1] 1 2 3 10`

The number of elements in a vector is the **length**:

`## [1] 3`

### 3.4.1 Calculating with vectors

R automatically applies most calculations and operations to every element
of a vector at the same time, so you work with vectors the same way
you would single values^{2}.

Adding, multipying, or comparing a single value with a vector automatically adds, multiplies or compares the single value to every element of the vector. The result is a new vector with the same length:

`## [1] 2 3 4`

`## [1] 30 60 90`

`## [1] FALSE TRUE FALSE`

When you work with two vectors of the same length, R automatically matches up the first element of one vector with the first element of the other, the second element with the second element, and so on:

`## [1] 11 22 33`

`## [1] TRUE FALSE TRUE`

**NB: not recommended!** It’s possible to do some tricks with
vectors of different lengths, but it’s generally not needed. Usually,
you’ll either have a single value you want to work with, or vectors
that are all the same length.

`## [1] 0 2 0 4 0 6 0 8`

Most of the time, trying to work with two vectors of different lengths will just produce a warning or an error that lets you know something is wrong:

```
## Warning in c(5, 6) * c(3, 8, 4): longer object length is not a multiple of
## shorter object length
```

`## [1] 15 48 20`

#### 3.4.1.1 R is built around vectors

The details above are relevant when you need to do custom calculations manually. But of course, R already has plenty of built-in commands, and these are all designed to work with vectors. A lot of the time, you’ll just be feeding vectors into existing commands:

`## [1] 6`

`## [1] 14`

```
##
## Welch Two Sample t-test
##
## data: c(1, 2, 3, 4) and c(1, 2, 3, 5)
## t = -0.23355, df = 5.5846, p-value = 0.8237
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.917221 2.417221
## sample estimates:
## mean of x mean of y
## 2.50 2.75
```

### 3.4.2 Missing values

All types of vectors allow for missing data, through the special `NA`

value.

`## [1] 29 NA 14`

Generally, `NA`

values will stay `NA`

when you try to calculate
with the vector.

`## [1] 58 NA 28`

`## [1] "A" NA "C"`

```
# Missing values on either side of the sum will produce
# missing values in the result
c(1, NA, 3) + c(4, 5, NA)
```

`## [1] 5 NA NA`

The `is.na()`

function can test which values are missing:

`## [1] FALSE TRUE FALSE`

Functions like `sum()`

and `mean()`

will produce a missing
result by default if *any* values in the input are missing. Use the
`na.rm = TRUE`

option (short for “`NA`

remove”) to ignore the missing values
and just use the values that are available:

`## [1] NA`

`## [1] 5`

Other functions in R will automatically remove missing values, but will usually warn you when they do. It’s always good to check how missing values are being treated, whatever tool you’re using.

### 3.4.3 Indexing: accessing parts of vectors

To access parts of a vector, use square brackets `[]`

after the vector
and use integers to specify which parts you want to extract. E.g. to
extract the second element:

`## [1] "b"`

This is known as **indexing**. Here, 2 is the **index**.

To extract multiple elements, you can use a vector as the index. R returns a new vector, containing the elements that match up to the index.

`## [1] "b" "c" "d"`

`## [1] "b" "c" "d"`

The index can include any number between 1 and the length of the vector, in any order. You can also access the same element multiple times:

`## [1] "d" "a" "b" "c" "a" "b"`

### 3.4.4 Logical indexing: filtering your data based on conditions

One of the most powerful tools in R is the ability to
access the subset of your data that meets a condition.
If you use a **logical** vector as the index for
your vector, R returns a new vector containing just the elements
where the index was `TRUE`

:

`## [1] 20 30`

This means you can filter the vector using the results of a logical test:

`## [1] 20 30`

This works because the result of `x >= 20`

is a logical vector
`c(TRUE, FALSE, TRUE)`

, which works just like it did above. If you’re having
trouble understanding a complex R expression, you can often pull out the
individual parts and test them separately to see how they work.

This kind of logical subsetting is particularly useful once you start testing based on other vectors:

```
group = c("Control", "Treatment", "Treatment", "Control")
score = c(6, 5, 7, 4)
score[group == "Treatment"]
```

`## [1] 5 7`

### 3.4.5 Changing vectors

To change part of a vector, you can index the vector on the left-hand
side of the `=`

/`<-`

symbol and put the replacement on the right-hand
side. You just need to make sure the replacement is either:

- A single value, or
- The same length as the part you’re replacing.

`## [1] 11 15 12 44 18`

`## [1] 11 6 7 8 18`

`## [1] 11 0 12 13 0`

Actually, most things in R are vectors, even when they look like single values. Even the single elements shown above, like

`3`

, are just vectors with a length of 1.↩In other programming languages, you might have to manually apply the operation to each element of the sequence, using something like a for loop.↩