The following exercises practice the essential dplyr commands on joining 2 tables.
8.4.1 Exercise 1
all.equal(left_join(data_t1, data_t2), right_join(data_t2, data_t1))
all.equal(full_join(data_t1, data_t2), inner_join(data_t2, data_t1))
all.equal(data_t1, semi_join(data_t1, data_t2))
all.equal(t1, full_join(semi_join(t1, t3), anti_join(t1, t3)))
- We load data_t1 and data_t2 (from the ds4psy package or the CSV files online) and arrange them to re-create
# Load data: data_t1 <- ds4psy::data_t1 # from ds4psy package data_t2 <- ds4psy::data_t2 # from ds4psy package # data_t1 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/data_t1.csv") # online # data_t2 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/data_t2.csv") # online # Copy data: t1 <- data_t1 t2 <- data_t2 # Arrange tables: t1 <- t1 %>% arrange(name, gender) t2 <- t2 %>% arrange(name, gender) # Inspect data: # t1 # t2
# Re-load tables t3 and t4 (from online source): t3 <- ds4psy::t3 # from ds4psy package t4 <- ds4psy::t4 # from ds4psy package # t3 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/t3.csv") # online # t4 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/t4.csv") # online # Inspect data: # t3 # t4
We now can evaluate the statements of this exercise and explain the results returned…
8.4.2 Exercise 2
Join these tables into one table that contains the same information as
Use tidyr and dplyr commands to transform the resulting join into
8.4.3 Exercise 3
This exercise uses the following data files:
ds4psy::data_t3(also available as CSV file from http://rpository.com/ds4psy/data/data_t3.csv)
ds4psy::data_t4(also available as CSV file from http://rpository.com/ds4psy/data/data_t4.csv)
- Import both tables of data (as
data_t4) and check out each table individually, before comparing them to each other. Do both tables contain the same cases and variables?
- Join both tables by using only the
data_t4as keys. Why is this difficult or impossible?
- Use a different command to join both tables into a 20 x 6 table that contains all data in 1 table.
8.4.4 Exercise 4
So far, we always distinguished between 3 main data files:
ds4psy::posPsy_p_info(also available as
posPsy_participants.csv: Original participant data, 295 x 6 variables:
ds4psy::posPsy_long(also available as
posPsy_AHI_CESD_corrected.csv: Corrected version of dependent measures in long format, 990 x 50 variables:
ds4psy::posPsy_wide(also available as
posPsy_data_wide.csv: Corrected version of all data joined in wide format, 295 x 294 variables:
Different measurement occasions are suffixed by .0, .1, …, .5.
Given your knowledge of dplyr and tidyr, we can now transform
posPsy_AHI_CESD_corrected.csv) and combine its parts with
posPsy_participants.csv) to obtain
posPsy_data_wide.csv). Do this in the following steps:
- Import the data files
data_wide, and inspect the dimensions of these tables.
# Load data: p_info <- ds4psy::posPsy_p_info # from ds4psy package AHI_CESD <- ds4psy::posPsy_long data_wide <- ds4psy::posPsy_wide # p_info <- readr::read_csv("http://rpository.com/ds4psy/data/posPsy_participants.csv") # online # AHI_CESD <- readr::read_csv("http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv") # data_wide <- readr::read_csv("http://rpository.com/ds4psy/data/posPsy_data_wide.csv") # Check: dim(p_info) # 295 x 6 #>  295 6 dim(AHI_CESD) # 990 x 50 #>  990 50 dim(data_wide) # 295 x 294 #>  295 294
- Create 6 individual tibbles that contains the data of each participant (1 row per
id) seperately for each
occasion(0 to 5).
- Combine the variables of the 6 tibbles into one wide block (in wide format) and compare your resulting table with data_wide.
Hint: As you cannot combine all 6 blocks in one command, they have to be combined in a stepwise fashion.
- As you might have noticed the number of columns of our newly created data set and the data in wide format (
data_wide) are not the same. That is because:
information on each participants’ intervention group is included 6 times in our created data;
data_widealso includes the information on the participants (contained in
Exclude any redundant variables on the
intervention group (keeping this variable only once) and include the participant information from
p_info to create a new table
complete_data. Check if and ensure that
complete_data is equal to
- How would you change the steps to join the 6 blocks (in 3.) if you only wanted to keep the participants who took part on all 6 possible occasions? How many of them exist?
- Use a dplyr summary table on the data in
AHI_CESDto determine the
idof all participants that took part in all 6 occasions. Then use a filter join on
AHI_CESDto keep only the (rows of) data of these participants.
8.4.5 Exercise 5
This exercise relies on the raw data for the false positive psychology (see Section B.2 of Appendix B for details on the data and corresponding information). The original data can be obtained at https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/ or https://zenodo.org/record/7664.
- Download the 2 raw data files and load them into R as
study_2(fixing any obvious errors).
- As both data files contain different participants, but the same variables about each participant, we want to save all data in one table. Combine both datasets, but mark each dataset by a
studyvariable (that has a value of either 1 or 2).
- Identify individuals and arrange the dataset: Add a numeric
IDvariable identifying each individual participant and sort the dataset in an intuitive way.
Splitting and joining
- The following code splits the data of
falsePosPsyinto 2 parts (
# Import the Dataset falsePosPsy <- read_csv(file = "http://rpository.com/ds4psy/data/falsePosPsy_all.csv") # online # Create 2 different datasets: pt_1 <- falsePosPsy %>% select(study, ID, aged, aged365, female, dad, mom) pt_2 <- falsePosPsy %>% select(ID, potato, when64, kalimba, cond, root, bird, political, quarterback, olddays, feelold, computer, diner)
- Re-create the full data set (from
pt_2) in at least 3 different ways (and verify that each of them yields the same data as
This concludes our exercises on using dplyr commands for joining 2 tables.