Welcome to this book and to this course! Before we start exploring R and the essentials of data science from a tidyverse perspective, this introductory chapter provides the current course coordinates, clarifies its contents and key concepts (e.g., the relation between data science and statistics), spells out background assumptions and constraints, and provides pointers to required software and additional resources.
The materials in this book support the following course:
- PSY-16620, at the University of Konstanz
by Hansjörg Neth (email@example.com, SPDS, office D507).
- Autumn 2023: Mondays, 13:30–15:00, D435.
Data analysis in psychology has a flavor of its own
— but one much more due to psychologists than to their science.
John W. Tukey (1969, p. 83)
The book and course Data Science for Psychologists provides an introduction to data science in R and conveys fundamental skills of data literacy and conducting reproducible research. The curriculum is tailored to the needs of students in psychology, but is also suited for students of the humanities and other biological or social sciences. It is targeted at advanced undergraduate students and structured into four parts:
Introducing key concepts and commands of the R programming language for statistical computing (R Core Team, 2024).1 This includes working with the RStudio IDE and creating reproducible research documents with R Markdown.
Exploring, transforming, and visualizing data of various shapes and types. In this course, cleaning, wrangling, and summarizing data will involve key tools of the so-called tidyverse (Wickham et al., 2019), including the R packages dplyr, ggplot2, tibble, and tidyr.
Gaining a deeper understanding of some important data types (e.g., acquiring the skills and tools for handling text and time-related data).
Providing a glimpse on elementary concepts of computer programming (including functions, conditionals, and iterative execution).
Working through the textbook (available at https://bookdown.org/hneth/ds4psy/) enables students to analyze, summarize, and understand data in a variety of ways. Importantly, our main focus is on making sense of data — by exploring, transforming, summarizing and visualizing it — rather than on statistical testing. Although all chapters involve typing computer code to solve data-related tasks, the final chapters provide an introduction to computer programming.
The book and course contain engaging examples from the behavioral sciences and are supported by the R package ds4psy (Neth, 2023b) that provides all datasets and additional functions for data generation and manipulation. A large variety of exercises and solutions allow students to check their understanding, monitor their progress, and practice their skills.
Students of psychology and other social sciences are trained to analyze data. But the data they learn to work with (e.g., in courses on statistics and empirical research methods) is typically provided to them and structured in a — mostly rectangular and often tidy (Wickham, 2014b) — format that includes and presupposes many steps of data processing regarding the aggregation and spatial layout of variables. When beginning to collect data from real sources, most students struggle with these pre-processing steps which — even for experienced data scientists — tend to require more time and effort than choosing and conducting statistical tests. This course develops the foundations of data analysis that allow students to collect data from real-world sources and transform such data into a shape that allows conducting reproducible research and answering scientific questions.
While there are many good introductions to data science — like R for Data Science (Wickham & Grolemund, 2017) — they typically do not cater towards the special background and needs — and often anxieties and reservations — of psychology students. As social scientists are not computer scientists, we introduce new concepts and commands without assuming a mathematical or computational background. Our data and examples typically involve people and questions currently of interest in scientific psychology. Adopting a task-oriented perspective, we begin with a specific problem and then solve it with some combination of data collection, manipulation, modeling, and visualization.
Our main goal is to develop useful skills for understanding and dealing with real-world data. Upon completing this course, its students will be able to read, transform, analyze, and visualize data of different shapes and types using a variety of tools. While this course does not deal with statistical testing and only scratches the surface of computer programming, it teaches reproducible research practices and covers fundamental data science skills.
This book is targeted at advanced undergraduate students (close to completing their BSc thesis) in psychology, the humanities, or related sciences. This audience typically has a basic familiarity with quantitative research methods, but little or no background in computer programming.
Key elements for succeeding in this course are a curiosity in making sense of data and a motivation for regular readings and exercises (see clarifications below).
Regular course attendance and preparation (by working through the current chapter before each session), solving and submitting weekly programming assignments, and succeeding in a final exam OR data science project.
Some basic familiarity with statistics and R is beneficial, but enthusiastic novices are also welcome.
Student performance is evaluated on the basis of two components:
A. Solutions to weekly programming assignments: To be submitted (on Ilias) by Thursday of the same week (by 23:59) on at least 10 out of 12 weeks.
B. Final assessment:
- Final exam (90 mins, open book); or
- Data science project: See Appendix C for guidelines and scope. (Final projects can be thesis-related; contents to be discussed with instructor.)2
Final grades are based on course participation (including regular submission of exercises) (A: 33%) and the final exam/project (B: 67%).
A few years ago, a course like this would first justify its use of R by its availability, flexibility, and increasing popularity. Today, R and the buzzwords data literacy and data science are so popular that we can skip this part. In fact, not using R or not knowing about data science would increasingly call for an explanation.↩︎
Data science projects are to be completed and submitted by Friday, March 1, 2024.↩︎