## Welcome

**Welcome** to this book and to this course! Before we start exploring R and the essentials of data science from a tidyverse perspective, this introductory chapter provides the current course coordinates, clarifies its contents and key concepts (e.g., the relation between data science and statistics), spells out background assumptions and constraints, and provides pointers to required software and related resources.

### Coordinates

The materials in this book support the following course:

- PSY-15150, at the University of Konstanz by Hansjörg Neth (h.neth@uni.kn, SPDS, office D507).
- Summer 2020: Mondays, 13:30–15:00, D522.
^{1}

- Course materials: Data Science for Psychologists (Ebook at https://bookdown.org/hneth/ds4psy/) | R package ds4psy
- Online platforms: ZeUS | Ilias

### Description

Data analysis in psychology has a flavor of its own

— but one much more due to psychologists than to their science.

John W. Tukey (1969, p. 83)

#### Abstract

This course *Data science for psychologists* provides an introduction to R (R Core Team, 2020) and conveys fundamental skills of data literacy and the basics of data science.^{2} It is suited for beginners and experienced students and contains 3 parts:

First, we introduce key concepts and commands of the R programming language for statistical computing. This includes working with the R Studio environment and writing reproducible research documents with R Markdown.

By working with different forms and types of data, the course provides a basic introduction to

*data literacy*. Although later sessions add some basic elements of*computer programming*(e.g., writing functions and loops), our main focus is on making sense of data (e.g., by creating summary tables and visualizations).Regular exercises with real datasets explore the tools of the so-called tidyverse (Wickham, 2019b, Wickham et al. (2019)), including the R packages

**dplyr**,**ggplot2**,**tibble**, and**tidyr**.

This book and course are supported by the R package ds4psy (Neth, 2020), which contains datasets and functions used in examples and exercises. Both book and course were originally based on parts of the popular textbook R for Data Science (Wickham & Grolemund, 2017), but the topics and datasets used in the course are geared more specifically to the interests and needs of psychologists and social scientists (involving people-related data, e.g., of patients or experimental participants).

Completing this course enables students to understand, transform, analyze, and visualize data in a variety of ways. While this course does not deal with *statistical testing* and only scratches the surface of *computer programming*, it teaches *reproducible research practices* and covers fundamental knowledge and skills of *data science*.

Enrolling in this course assumes no prior knowledge in programming or statistics, but motivation for weekly readings and for regularly solving exercises. Grades are determined by submitted solutions to exercises and a final project or exam.

#### Background

Students of psychology and other social sciences are trained to analyze data. But the data they learn to work with (e.g., in courses on statistics and empirical research methods) is typically provided to them and structured in a — mostly rectangular and often *tidy* (Wickham, 2014b) — format that includes and presupposes many steps of data processing regarding the aggregation and spatial layout of variables. When beginning to collect data from real sources, most students struggle with these pre-processing steps which — even for experienced data scientists — tend to require more time and effort than choosing and conducting statistical tests. This course develops the foundations of data analysis that allow students to collect data from real-world sources and transform such data into a shape that allows conducting reproducible research and answering scientific questions.

While there are many good introductions to data science — like R for Data Science (Wickham & Grolemund, 2017) — they typically do not cater towards the special background and needs — and often anxieties and reservations — of *psychology* students. As social scientists are not computer scientists, we introduce new concepts and commands without assuming a mathematical or computational background. Our data and examples typically involve people and questions currently of interest in scientific psychology. Adopting a task-oriented perspective, we begin with a specific problem and then solve it with some combination of data collection, manipulation, modeling, and visualization.

### Goals

Our main goal is to develop a set of useful skills in analyzing real-world data and conducting reproducible research. Upon completing this course, you will be able to read, transform, analyze, and visualize data of various types. Many interactive exercises allow students to check their understanding, monitor their progress, and practice their skills.

### Requirements

Attending class, continuos preparation (by working through the current chapter *before* attending class), and solving and submitting weekly programming assignments (formerly abbreviated as WPAs, see Assessment), and final exam OR data science project.

This course assumes some basic familiarity with statistics and the R programming language, but enthusiastic programming novices are also welcome.

### Assessment

A. Submitting solutions to *weekly programming assignments* (WPAs) on Ilias by *Thursday* of the same week (by 23:59) on at least 10 out of 12 weeks.

B. Final assessment:

*Data science project*: See Appendix C for guidelines and scope. (Final projects can be thesis-related; contents to be discussed with instructor.)^{3}

Final *grades* are based on course participation (including regular submission of exercises) (A: 33%) and the final exam/project (B: 67%).

### References

Neth, H. (2020). *ds4psy: Data science for psychologists*. Retrieved from https://CRAN.R-project.org/package=ds4psy

R Core Team. (2020). *R: A language and environment for statistical computing*. Retrieved from https://www.R-project.org

Tukey, J. W. (1969). Analyzing data: Sanctification or detective work. *American Psychologist*, *2*, 83–91. https://doi.org/10.1037/h0027108

Wickham, H. (2014b). Tidy data. *Journal of Statistical Software*, *59*(10), 1–23. https://doi.org/10.18637/jss.v059.i10

Wickham, H. (2019b). *tidyverse: Easily install and load the ’tidyverse’*. Retrieved from https://CRAN.R-project.org/package=tidyverse

Wickham, H., & Grolemund, G. (2017). *R for data science: Import, tidy, transform, visualize, and model data*. Retrieved from http://r4ds.had.co.nz

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. *Journal of Open Source Software*, *4*(43), 1686. https://doi.org/10.21105/joss.01686

During the

**Covid-19**pandemic, the organisational details of this course at the University of Konstanz are managed via Ilias. However, all course materials remain available here and are free to use for everyone interested.↩A few years ago, a course like this would first justify its use of R by its availability, flexibility, and increasing popularity. Today, R and the buzzwords

*data literacy*and*data science*are so popular that we can skip this part. In fact,*not*using R or*not*knowing about data science would increasingly call for an explanation.↩Data science project are to be completed and submitted by 2020-08-01.↩