Section 3 Week 2 - November 25, 2022

Welcome! Today will we go over some things that we could not last week because of tech issues, and some new things.

3.1 Installing and Loading Packages

Last week we tried to install the tidyverse, but ran into troubles. Let’s try to install a different package. Let’s install the effsize package. Remember, we have two ways to install. The bottom-right panel and the function:

install.packages('effsize')

While we have downloaded the ‘app’, we need to click it! This is done by running the package/library using the library() function. Note: you need to have ” ” around the package name to install it, but not to run it.

  • Practice Exercise - Run both the tidyverse and effsize packages.
library(tidyverse)
library(effsize)

3.2 Different Ways of Doing Things

Last week we went over how to call a variable as a stand-alone vector, such as calculating the mean of ‘scores’:

scores <- c(5, 2, 3, 4, 1)
mean(scores)
## [1] 3

and within a dataframe. We calculated the mean of mpg in the mtcars dataset by putting a $ between the dataset and the variable.

mean(mtcars$mpg)
## [1] 20.09062

An alternative way to do this for all variables:

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

The tidyverse offers an alternative way to do this, that may be useful when wanting specific descriptives of specific variables:

mtcars  %>%  
  summarize(mean_mpg=mean(mpg),
            sd_mpg=sd(mpg),
            mean_cyl=mean(cyl),
            sd_cul=sd(cyl),
            name_them_anything = mean(hp))
##   mean_mpg   sd_mpg mean_cyl   sd_cul name_them_anything
## 1 20.09062 6.026948   6.1875 1.785922           146.6875

As another example to compare the tidyverse and base R,consider this line of code:

table(mtcars$cyl, mtcars$carb)
##    
##     1 2 3 4 6 8
##   4 5 6 0 0 0 0
##   6 2 0 0 4 1 0
##   8 0 4 3 6 0 1

Versus tidyverse:

mtcars %>% 
  select(cyl, carb) %>% 
  table()
##    carb
## cyl 1 2 3 4 6 8
##   4 5 6 0 0 0 0
##   6 2 0 0 4 1 0
##   8 0 4 3 6 0 1

3.3 Creating and Chaning Variables

You kind of already know how to create things! Remember ‘thing <- what the thing is’? While we could easily use ‘=’ instead of ‘<-’, <- is typically reserved for making objects, while = is for arguments in functions.

3.4 select

Select is used to select a subset of variables. Imagine we only wanted to have the names of the Star Wars characters. Let’s make an object called ‘sw_names’. We use the select function:

sw_names <- starwars %>% 
  select(name)

Now imagine we wanted everything except the name variable. We can simply:

sw_no_names <- starwars %>% 
  select(-name)

3.5 filter

Filter is for subsetting data based on values of a variable. For example, imagine want to have only male or female participants. We can do this in ‘starwars’ data set using filter.

sw_male <- starwars %>% 
  filter(sex=="male")

sw_female <- starwars %>% 
  filter(sex=="female")

You may notice that it is “==” and not “=”. Remember, “=” is used to create something, while “==” checks if the two values are equal. So filter(sex=="male") is asking R, ’only select cases that result in TRUE.

As an example, let’s check if some values are equal:

10 == 10
## [1] TRUE
9 == 10
## [1] FALSE
100 == 10^2
## [1] TRUE
mean(c(1, 2, 3)) == 100
## [1] FALSE
mean(c(1, 2, 3)) == 2
## [1] TRUE

3.6 mutate

The tidyverse has an intuitive way of making objects with mutate. We can call the data, use %>%, and the the mutate function. Let’s make a dataset that is just the starwars data set. Let’s call it, “sw”.

sw <- starwars

You can now see we have a dataset of 87 observations and 14 variables. Let’s make a new column called “height_inches”. The new variables will be the current height (which is in cm), multiplied by 0.3937. You can do so with:

sw <- sw %>% 
  mutate(height_inches = height*0.3937)
  • Practice question 2 Using the tidyverse and the starwars data set, find the mean mass of the characters.

  • Practice question 3 Using the tidyverse and the starwars data set, create a contingency table of species and sex. Bonus: do a contingency table for sex for humans only. How many female versus male human characters are listed?

  • Practice question 4 Use the tidyverse to create a new variables in the created ‘sw’ dataset. Call it “BMI” and it’s value is \(\frac{kg}{m^2}\).

  • Practice question 5 Copy, paste, and run the following code:

set.seed(10)
df <- data.frame(x=rnorm(100, 100, 15),
                 x=rbinom(100, 100, .5))

You now have a dataset called ‘df’ that has two variables, x and y. Calculate the mean of both x and y.

  • Practice question 6 Within the dataset ‘df’, create a new variable called ‘xy’ that is the squared product of x and y (i.e., multiple them and then square it).