7 Tutorial 7: Drawing Random Samples
After working through Tutorial 7, you’ll…
- understand how to use R to draw random samples
For this tutorial, we’ll use the data set “Population.csv” (via Moodle/Data for R).
The data contains contains a (fictional) sample sheet including a total of N = 1000 unique IDs for Instagram/TikTok posts by two UK (Guardian, BBC) and two US (CNN, Washington Post) outlets. The variables included are…
- the ID of each unit of analysis: PostID
- platform on which each post was published (Instagram or TikTok): Platform
- the outlet which published each post: Outlet
<- read.csv2("Population.csv", header = TRUE) population
head(population)
## PostID Platform Outlet
## 1 ID01 Instagram BBC
## 2 ID02 Instagram BBC
## 3 ID03 Instagram BBC
## 4 ID04 Instagram BBC
## 5 ID05 Instagram BBC
## 6 ID06 Instagram BBC
Since we cannot code the full population of N = 1000 articles, we’ll have to draw a sample, here of N = 100 posts.
We’ll now learn how to draw simple random samples and stratified random samples.
7.1 Simple Random Sample
A simple random sample includes a subset of randomly selected cases from the population.
In R, we can draw a simple random sample of N = 100 articles from our population of N = 1000 posts by using the slice_sample() function:
<- population %>%
simple.random.sample slice_sample(n = 100)
Important: Every time you re-run the command, R will randomly select a new sample and overwrite the old one.
If we already know how many coders are going to code, we could even already randomly assign each post to a coder. Let’s assume we have 4 coders, with each coder coding 25 posts.
We’ll have to…
- create a vector called coders containing the IDs of all coders, repeated 25 times (for all 25 posts they’ll code)
- combine coders with sample to create a single data frame
In R, this would work like so…
<- rep(c("Coder1", "Coder2", "Coder3", "Coder4"), 25)
coders <- simple.random.sample %>%
simple.random.sample mutate(coder = coders)
head(simple.random.sample)
## PostID Platform Outlet coder
## 1 ID529 Instagram WP Coder1
## 2 ID418 Instagram WP Coder2
## 3 ID515 Instagram WP Coder3
## 4 ID67 Instagram BBC Coder4
## 5 ID569 Instagram WP Coder1
## 6 ID88 Instagram BBC Coder2
We could now save this sample for data analysis to our local folder:
write.csv2(simple.random.sample, "simple_random_sample.csv", row.names = FALSE)
7.2 Stratified Sample
A stratified includes a subset of randomly selected cases from groups within the population.
For instance, we may want to get random samples of both UK and US outlets - e.g., 50 posts by UK and 50 posts by US outlets. To do so, we’ll first have to create a variable indicating country origin:
<- population %>%
population mutate(Country=recode(Outlet,
"BBC" = "UK",
"GUARDIAN" = "UK",
"CNN" = "US",
"WP" = "US"))
We then draw a stratified sample N = 100 articles from our population of N = 1000 posts, here:
- 50 randomly selected articles from the UK
- 50 randomly selected articles from the US
We therefore first group() observations by country to then draw a slice_sample_n():
<- population %>%
stratified.sample group_by(Country) %>%
slice_sample(n = 50)
head(stratified.sample)
## # A tibble: 6 x 4
## # Groups: Country [1]
## PostID Platform Outlet Country
## <chr> <chr> <chr> <chr>
## 1 ID254 Instagram GUARDIAN UK
## 2 ID276 Instagram GUARDIAN UK
## 3 ID43 Instagram BBC UK
## 4 ID360 Instagram GUARDIAN UK
## 5 ID240 Instagram GUARDIAN UK
## 6 ID255 Instagram GUARDIAN UK
Here, we could then again assign IDs to coders and write out the object stratified.sample to an Excel file.
7.3 Take Aways
- Drawing random samples:
- Drawing samples: slice_sample()
Let’s keep going: Tutorial 8: Data Analysis.