Section 2 Week 1 - November 18, 2022

2.1 R Layout

This was our first meeting. The group was introduced to the R interface, including the various panes (i.e., sections of RStudio) and what they are used for.

The bottom-left (given you are using the default lay out) is the console, where code runs. You can type code directly into the console. For example, you can run some basic maths (type or copy and paste the following into your console):

5 + 6
# ^ indicates exponent
25^3 
# sqrt() is square root
sqrt(16)

The top-left is your current script. If you don’t have one open - typically, you will see a name such as ‘Untitled1’ - you can open one by clicking File -> New File -> R Script or CTRL + SHIFT + N (Windows)/ CMD+SHIFT+N (Mac). Scripts are a great way to organize and save your code. We will use scripts throughout our meetings.

The bottom-right has a few different functions. It’s used top view files, such as those in your working directory (where R will automatically try to load or save files), view figures/plots, and get help (e.g., view documentation).

The top-right pane is the environment and displays all current objects being used by R. For example, it could display data sets, values, functions, etc. You can create objects by using the following structure object name <- what the object is. Note, the keyboard shortcut for ‘<-’ is ALT + - (Windows)/FUNCTION + - (Mac). For example, if I wanted to create an object called hello that was just the number 5, I could run:

hello <- 5

Or I could make a object called ‘scores’ that is a list of numbers, ‘5, 2, 4, 2, 1’:

scores <- c(5, 2, 4, 2, 1)

Note that c() means concatenate (things together in a series) and that you must separate things by a comma.

2.2 Variable Types

Much like SPSS, there are several variable types. We will primarily be concerned with: numeric, factor, and character. Numeric is easy enough to understand. Factors are ‘words’ or ‘strings’ that get stored as a number that is a level. For example, ‘color’ might have the levels: ‘red’, ‘blue’, and ‘green’. Storing as a factor allows for easy comparison of colors; every who selects ‘red’ will be group at that level of a factors. Characters do not work this way, so it will be important to ensure your factors are indeed factors.

You can store factors/characters as objects as well. Let’s create an object called ‘Tyler’ that contains my last name, favorite number, and favorite color. Note: strings must be contained in ” ” (object names do not follow this rule).

Tyler <- c("Pritchard", 23, "Blue")

2.3 Installing Packages

We discussed how R is like an iPhone. While it comes with awesome things you will use a lot, like Camera and Photos, you may want to download some extra apps. Apps are like R packages/libraries. One package we will use a lot are those from the tidyverse. You can download this package by using the bottom-right pane. Packages -> Install -> tidyverse. Or, you could run a script (note: you must put the package name in quotation marks ’ ’ or ” “):

install.packages('tidyverse')

Not only must you download packages, you must run them if you want to use them. Imagine trying to take a photo on your phone, you can’t just hold it up and say ‘take photo’, you must first click the ‘Camera’ app. Loading a package in R is like clicking an app you want to use. Every time you close R, all you packages (‘apps’) will be exited; so, you must re-run the package when you open R again. You run a package using library(), with the package in brackets. So:

library(tidyverse)

2.4 Functions

Functions can do so many things. We will not focus on creating functions right now, only using existing ones. You will use them all the time when using R. You can call on functions by typing their name. They typically are followed by brackets that have arguments in them. Arguments are like choices that you must specify for the function to run. Some arguments have default values, which means that you do not need to change or specify anything with them unless needed.

Functions can calculate statistics, create figures, generate data, send a Tweet, download data from Google, etc. Also, as mentioned, you can create functions to do whatever you like.

Let’s use a simple function to calculate the mean of the ‘scores’, the object we made above. R comes with many built in functions, including mean(). As you type a function, the arguments may display on your screen. Otherwise, you can run the function with a ‘?’ it to have the help documentation pop up in the bottom-right pane.

?mean()

For mean, the only thing we need to specify if the object on which we want to calculate the mean. So:

mean(scores)
## [1] 2.8

The mean of ‘scores’ is 2.8.

Another common function you may use is sd(), which calculates the standard deviation of a set of values.

sd(scores)
## [1] 1.643168

2.5 Using Data

Most of the time, you will likely not be using a vector (which is what our object ‘scores’ is). You will be using a set of data. In R, we can call this a data.frame (tibbles also exist, but we will primarily use data frames). R comes with some built in data frames, so let’s store one in the environment.

df <- mtcars

and it looks something like this:

mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

We can quickly look at the data structure using the str() function.

str(df)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Here, you can see the names of the variables, their types, and the first few values.

Remember the mean() function? It needed a set of data to calculate mean on. Let’s try to calculate the mean of a variable in the data set we just put in the environment.

To call a specific variable in a data.frame, we must name the data frame, put ‘$’, and then list the variable. Let’s calculate the mean of the miles per gallon variable, ‘mpg’.

mean(df$mpg)
## [1] 20.09062

2.6 Quick Plots

You can quickly plot with the plot() function. It will take the variables you put in the function and try to make the best type of figure. Let’s plot two variables from our data set: mpg (miles per gallon) and hp (horsepower).

plot(df$mpg, df$hp)

2.7 Quick Correlation NHST

R also has a quick function to do a NHST for a correlation: cor.test(). Let’s run a correlation on the variables from above. The help documentation (remember ?cot.test()) indicates that the only arguments we need to specify are x and y. Let x be mpg and y be hp:

cor.test(df$mpg, df$hp)
## 
##  Pearson's product-moment correlation
## 
## data:  df$mpg and df$hp
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8852686 -0.5860994
## sample estimates:
##        cor 
## -0.7761684

The results suggest that these data are unlikely given a true null.

That’s all we covered this week!

2.8 Practice questions

  1. Install the ‘dplyr’ package.

  2. Load the starwars data set into the environment. Give it the name ‘sw’.

  3. Calculate the mean of height of the Star Wars characters.

  • Hint: the mean() function defaults an argument na.rm to FALSE. This will keep values of NA. You will have to remove NAs. Change the argument to remove NAs.