7 Tutorial 7: Drawing Random Samples

After working through Tutorial 7, you’ll…

  • understand how to use R to draw random samples

For this tutorial, we’ll use the data set “Population.csv” (via Moodle/Data for R).

The data contains contains a (fictional) sample sheet including a total of N = 1000 unique IDs for Instagram/TikTok posts by two UK (Guardian, BBC) and two US (CNN, Washington Post) outlets. The variables included are…

  • the ID of each unit of analysis: PostID
  • platform on which each post was published (Instagram or TikTok): Platform
  • the outlet which published each post: Outlet
population <- read.csv2("Population.csv", header = TRUE)
head(population)
##   PostID  Platform Outlet
## 1   ID01 Instagram    BBC
## 2   ID02 Instagram    BBC
## 3   ID03 Instagram    BBC
## 4   ID04 Instagram    BBC
## 5   ID05 Instagram    BBC
## 6   ID06 Instagram    BBC

Since we cannot code the full population of N = 1000 articles, we’ll have to draw a sample, here of N = 100 posts.

We’ll now learn how to draw simple random samples and stratified random samples.

7.1 Simple Random Sample

A simple random sample includes a subset of randomly selected cases from the population.

In R, we can draw a simple random sample of N = 100 articles from our population of N = 1000 posts by using the slice_sample() function:

simple.random.sample <- population %>%
  slice_sample(n = 100)

Important: Every time you re-run the command, R will randomly select a new sample and overwrite the old one.

If we already know how many coders are going to code, we could even already randomly assign each post to a coder. Let’s assume we have 4 coders, with each coder coding 25 posts.

We’ll have to…

  • create a vector called coders containing the IDs of all coders, repeated 25 times (for all 25 posts they’ll code)
  • combine coders with sample to create a single data frame

In R, this would work like so…

coders <- rep(c("Coder1", "Coder2", "Coder3", "Coder4"), 25)
simple.random.sample <- simple.random.sample %>%
  mutate(coder = coders)
head(simple.random.sample)
##   PostID  Platform Outlet  coder
## 1  ID529 Instagram     WP Coder1
## 2  ID418 Instagram     WP Coder2
## 3  ID515 Instagram     WP Coder3
## 4   ID67 Instagram    BBC Coder4
## 5  ID569 Instagram     WP Coder1
## 6   ID88 Instagram    BBC Coder2

We could now save this sample for data analysis to our local folder:

write.csv2(simple.random.sample, "simple_random_sample.csv", row.names = FALSE)

7.2 Stratified Sample

A stratified includes a subset of randomly selected cases from groups within the population.

For instance, we may want to get random samples of both UK and US outlets - e.g., 50 posts by UK and 50 posts by US outlets. To do so, we’ll first have to create a variable indicating country origin:

population <- population %>% 
 mutate(Country=recode(Outlet, 
                       "BBC" = "UK", 
                       "GUARDIAN" = "UK",
                       "CNN" = "US",
                       "WP" = "US"))

We then draw a stratified sample N = 100 articles from our population of N = 1000 posts, here:

  • 50 randomly selected articles from the UK
  • 50 randomly selected articles from the US

We therefore first group() observations by country to then draw a slice_sample_n():

stratified.sample <- population %>%
                  group_by(Country) %>%
                  slice_sample(n = 50)
head(stratified.sample)
## # A tibble: 6 x 4
## # Groups:   Country [1]
##   PostID Platform  Outlet   Country
##   <chr>  <chr>     <chr>    <chr>  
## 1 ID254  Instagram GUARDIAN UK     
## 2 ID276  Instagram GUARDIAN UK     
## 3 ID43   Instagram BBC      UK     
## 4 ID360  Instagram GUARDIAN UK     
## 5 ID240  Instagram GUARDIAN UK     
## 6 ID255  Instagram GUARDIAN UK

Here, we could then again assign IDs to coders and write out the object stratified.sample to an Excel file.

7.3 Take Aways

  • Drawing random samples:
    • Drawing samples: slice_sample()

Let’s keep going: Tutorial 8: Data Analysis.