Section 3 Week 2 - November 25, 2022
Welcome! Today will we go over some things that we could not last week because of tech issues, and some new things.
3.1 Installing and Loading Packages
Last week we tried to install the tidyverse
, but ran into troubles. Let’s try to install a different package. Let’s install the effsize
package. Remember, we have two ways to install. The bottom-right panel and the function:
install.packages('effsize')
While we have downloaded the ‘app’, we need to click it! This is done by running the package/library using the library()
function. Note: you need to have ” ” around the package name to install it, but not to run it.
- Practice Exercise - Run both the tidyverse and effsize packages.
library(tidyverse)
library(effsize)
3.2 Different Ways of Doing Things
Last week we went over how to call a variable as a stand-alone vector, such as calculating the mean of ‘scores’:
<- c(5, 2, 3, 4, 1)
scores mean(scores)
## [1] 3
and within a dataframe. We calculated the mean of mpg in the mtcars dataset by putting a $ between the dataset and the variable.
mean(mtcars$mpg)
## [1] 20.09062
An alternative way to do this for all variables:
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
The tidyverse offers an alternative way to do this, that may be useful when wanting specific descriptives of specific variables:
%>%
mtcars summarize(mean_mpg=mean(mpg),
sd_mpg=sd(mpg),
mean_cyl=mean(cyl),
sd_cul=sd(cyl),
name_them_anything = mean(hp))
## mean_mpg sd_mpg mean_cyl sd_cul name_them_anything
## 1 20.09062 6.026948 6.1875 1.785922 146.6875
As another example to compare the tidyverse and base R,consider this line of code:
table(mtcars$cyl, mtcars$carb)
##
## 1 2 3 4 6 8
## 4 5 6 0 0 0 0
## 6 2 0 0 4 1 0
## 8 0 4 3 6 0 1
Versus tidyverse:
%>%
mtcars select(cyl, carb) %>%
table()
## carb
## cyl 1 2 3 4 6 8
## 4 5 6 0 0 0 0
## 6 2 0 0 4 1 0
## 8 0 4 3 6 0 1
3.3 Creating and Chaning Variables
You kind of already know how to create things! Remember ‘thing <- what the thing is’? While we could easily use ‘=’ instead of ‘<-’, <- is typically reserved for making objects, while = is for arguments in functions.
3.4 select
Select is used to select a subset of variables. Imagine we only wanted to have the names of the Star Wars characters. Let’s make an object called ‘sw_names’. We use the select function:
<- starwars %>%
sw_names select(name)
Now imagine we wanted everything except the name variable. We can simply:
<- starwars %>%
sw_no_names select(-name)
3.5 filter
Filter is for subsetting data based on values of a variable. For example, imagine want to have only male or female participants. We can do this in ‘starwars’ data set using filter.
<- starwars %>%
sw_male filter(sex=="male")
<- starwars %>%
sw_female filter(sex=="female")
You may notice that it is “==” and not “=”. Remember, “=” is used to create something, while “==” checks if the two values are equal. So filter(sex=="male")
is asking R, ’only select cases that result in TRUE
.
As an example, let’s check if some values are equal:
10 == 10
## [1] TRUE
9 == 10
## [1] FALSE
100 == 10^2
## [1] TRUE
mean(c(1, 2, 3)) == 100
## [1] FALSE
mean(c(1, 2, 3)) == 2
## [1] TRUE
3.6 mutate
The tidyverse
has an intuitive way of making objects with mutate
. We can call the data, use %>%
, and the the mutate function. Let’s make a dataset that is just the starwars data set. Let’s call it, “sw”.
<- starwars sw
You can now see we have a dataset of 87 observations and 14 variables. Let’s make a new column called “height_inches”. The new variables will be the current height (which is in cm), multiplied by 0.3937. You can do so with:
<- sw %>%
sw mutate(height_inches = height*0.3937)
Practice question 2 Using the tidyverse and the starwars data set, find the mean mass of the characters.
Practice question 3 Using the tidyverse and the starwars data set, create a contingency table of species and sex. Bonus: do a contingency table for sex for humans only. How many female versus male human characters are listed?
Practice question 4 Use the tidyverse to create a new variables in the created ‘sw’ dataset. Call it “BMI” and it’s value is \(\frac{kg}{m^2}\).
Practice question 5 Copy, paste, and run the following code:
set.seed(10)
<- data.frame(x=rnorm(100, 100, 15),
df x=rbinom(100, 100, .5))
You now have a dataset called ‘df’ that has two variables, x and y. Calculate the mean of both x and y.
- Practice question 6 Within the dataset ‘df’, create a new variable called ‘xy’ that is the squared product of x and y (i.e., multiple them and then square it).