5.3 Lab: Sampling & randomizing
Eventually, we might find ourselves in a situation in which we have a sampling frame, i.e. a list of all units within a population who can be sampled. For budget reasons we can’t run our experiment with all units. So we will need to draw a random sample. Subsequently, we want to randomly assign units in our sample to treatment/control.
Below we create an artifical population, i.e. a dataframe that contains all units in the population. We call this dataframe population.
# Create exemplary sampling frame
population <- data.frame(id = sample(x = 1:99999, 2000, replace = FALSE),
firstname = randomNames(2000, which.names="first"),
lastname = randomNames(2000, which.names="last"),
age = sample(x = 18:85, 2000, replace = TRUE))
population$email <- paste(population$firstname, ".",
population$lastname, "@example",
sample(1:9, 2000, replace = TRUE),".",
sample(c("de", "com", "org", "es"), 2000, replace = TRUE),
sep = "")
The population, i.e., sampling frame contains 2000
units. First, we would like to draw a sample of 200 individuals among whom we conduct our experiment. Below we do so and have a look at the first six units in our sample.
id | firstname | lastname | age | |
---|---|---|---|---|
97102 | Maaiz | Park | 81 | Maaiz.Park@example8.de |
6365 | Morganna | al-Pour | 65 | Morganna.al-Pour@example2.es |
78890 | Gedion | al-Fahmy | 38 | Gedion.al-Fahmy@example3.com |
82562 | Sirmichael | Braud | 41 | Sirmichael.Braud@example7.com |
98332 | Reanne | Rodriguez | 35 | Reanne.Rodriguez@example5.es |
89963 | Sareena | Saracay Pena | 50 | Sareena.Saracay Pena@example8.com |
“Simple random assignment assigns all subjects to treatment with an equal probability by flipping a (weighted) coin for each subject.” (Source). To do so we need a variable that contains units’ assignment status. Accordingly, we create a variable treatment
in which we store that information.
We can do that for two treatment groups.
id | firstname | lastname | age | treatment | |
---|---|---|---|---|---|
97102 | Maaiz | Park | 81 | Maaiz.Park@example8.de | 1 |
6365 | Morganna | al-Pour | 65 | Morganna.al-Pour@example2.es | 0 |
78890 | Gedion | al-Fahmy | 38 | Gedion.al-Fahmy@example3.com | 1 |
82562 | Sirmichael | Braud | 41 | Sirmichael.Braud@example7.com | 1 |
98332 | Reanne | Rodriguez | 35 | Reanne.Rodriguez@example5.es | 0 |
89963 | Sareena | Saracay Pena | 50 | Sareena.Saracay Pena@example8.com | 1 |
…for three treatment groups (below names are assigned automatically).
id | firstname | lastname | age | treatment | |
---|---|---|---|---|---|
97102 | Maaiz | Park | 81 | Maaiz.Park@example8.de | T3 |
6365 | Morganna | al-Pour | 65 | Morganna.al-Pour@example2.es | T3 |
78890 | Gedion | al-Fahmy | 38 | Gedion.al-Fahmy@example3.com | T2 |
82562 | Sirmichael | Braud | 41 | Sirmichael.Braud@example7.com | T1 |
98332 | Reanne | Rodriguez | 35 | Reanne.Rodriguez@example5.es | T2 |
89963 | Sareena | Saracay Pena | 50 | Sareena.Saracay Pena@example8.com | T3 |
Finally, an example with four treatment groups, where we decide about the assignment probabilities and name the values of the treatment variable (character vector).
sample$treatment <- simple_ra(N = nrow(sample),
prob_each = c(.25, .25, .25, .25),
condition =c("treatment1",
"treatment2",
"treatment3",
"control"))
head(sample)
id | firstname | lastname | age | treatment | |
---|---|---|---|---|---|
97102 | Maaiz | Park | 81 | Maaiz.Park@example8.de | control |
6365 | Morganna | al-Pour | 65 | Morganna.al-Pour@example2.es | treatment3 |
78890 | Gedion | al-Fahmy | 38 | Gedion.al-Fahmy@example3.com | control |
82562 | Sirmichael | Braud | 41 | Sirmichael.Braud@example7.com | treatment1 |
98332 | Reanne | Rodriguez | 35 | Reanne.Rodriguez@example5.es | control |
89963 | Sareena | Saracay Pena | 50 | Sareena.Saracay Pena@example8.com | control |
Once, we have randomly assigned our sample units to treatment groups we can conduct our experiment and assign them the real treatments based on our dataframe. For instance, we might send them different versions of a questionnaire etc.
The randomizr
package contains further useful functions arguments and there are other forms of random assignment (???). See the randomizr vignette.