6.4 Exercises

ds4psy: Exercises 6

The following exercises practice skills in navigating local directories and using essential readr commands for importing and writing data.

6.4.1 Exercise 1

6.4.2 Exercise 2

Parsing dates and numbers

Look at your ID card and type your birthday as a string as it’s written on the card (including any spaces or punctuation symbols). For instance, if you were Erika Mustermann (see https://de.wikipedia.org/wiki/Personalausweis_(Deutschland)) you would write the character string “12.08.1964”.

  1. Use an appropriate parse_ command to read this character string into R.

  2. Now read out the date in German (i.e., “12. August 1964”) and use another command to parse this string into R.

  3. Use Google Translate to translate this character string into French, Italian, and Spanish and use appropriate R commands to parse these strings into R.

Hint: Consult vignette("locales") for specifying languages.

  1. Use a parse_ command (with an appropriate locale) to parse the following character strings into the desired data format:
  • "US$1,099.95" as a number;
  • "EUR1.099,95" as a number.

6.4.3 Exercise 3

A read-write-read cycle

  1. Read in the data in file http://rpository.com/ds4psy/data/data_2.dat into an R object data_2, but by using the command read_delim() rather than by using read_fwf() (as above).

Hint: The variable names should be the same as above, but inspect the file to see its delimiter.

  1. Store the data file as data_2.csv (a csv file that includes variable names) into a directory that is not your current working directory.

  2. Now use a command to re-read the file data_2.csv back into an object data_2b and use the all.equal() function to verify that data_2 and data_2b are equal.

6.4.4 Exercise 4

Reading odd data

The following data files are variants of the data at http://rpository.com/ds4psy/data/falsePosPsy_all.csv:

(See Section B.2 of Appendix B for details on the data and corresponding articles.)

Hint: Define the file paths as R objects saves you from typing them repeatedly later.

  1. Inspect file ex1.dat and read it in two ways (by using either the generic read.csv() or the appropriate variant of read_csv()). How do the data read differ from each other?

  2. Inspect and import the dataset ex2.dat using appropriate command(s).

  3. Inspect and import the dataset ex3.dat using appropriate command(s).

  4. Inspect and import the dataset ex4.dat using appropriate command(s). Specifically, note the encoding of the age variable (aged365) and check whether you can compute participants’ average age (in years) after importing the data.

6.4.5 Exercise 5

Writing data

In Exercise 4 of the previous chapter on tibbles (see Section 5.4.4 of Chapter 5), we created the following summary tibble in different ways (either directly entering it by using tibble commands, or by using dplyr commands to obtain a summary table from the raw data):

Table 6.1: Age-related data from Simmons et al. (2011) [see Exercise 4 of the tibbles chapter.]
cond n mn_ag mi_ag mx_ag fl_vyng fl_yng fl_mid fl_old fl_vold
64 25 21.09 18.30 38.24 0 13 10 2 0
control 22 20.80 18.53 27.23 3 15 3 1 0
potato 31 20.60 18.18 27.37 1 17 11 2 0

(See Section B.2 of Appendix B for details on the data and corresponding articles.)

Imagine that you are trying to send this file to a friend who — due to excessive demand for our course — was unable to secure a spot in this course and ended up in a course on the “History of data science”, whose members are encouraged to experiment with software products like MS Excel and SPSS.

  1. Assuming that your friend is currently located in Troy, NY (i.e., in the USA), export the summary as a file that your friend can read with her software.

  2. Read back your file and verify that it contains the same information as your original summary.

  3. Now repeat both steps (i.e., writing and re-reading the summary data) under the assumption that your friend is located in Berlin, Germany.

6.4.6 Exercise 6

Variants of p_info

In this exercise, we re-visit the participant data on positive psychology interventions that we have analyzed before and try to parse some variants of this data. (See Section B.1 of Appendix B for details on the data.)

  1. Load the data at http://rpository.com/ds4psy/data/posPsy_participants.csv into an R object p_info and compute participants’ mean age by intervention, by sex, and by level of education (educ).

  2. Download the file p_info_2.dat (located at http://rpository.com/ds4psy/data/p_info_2.dat) into a local directory (called data) and import it from there into an R object p_info_2.
    (Hint: Inspect the file prior to loading it: What is different in this file?)

  3. Recompute the mean age by intervention, by sex, and by level of education (educ). Are they the same as before?

This concludes our set of exercises on importing data.