Welcome to this book and to this course! Before we start exploring R and the essentials of data science from a tidyverse perspective, this introductory chapter provides the current course coordinates, clarifies its contents and key concepts (e.g., the relation between data science and statistics), spells out background assumptions and constraints, and provides pointers to required software and related resources.
The materials in this book support the following course:
Data analysis in psychology has a flavor of its own
— but one much more due to psychologists than to their science.
John W. Tukey (1969, p. 83)
This course Data science for psychologists provides an introduction to R (R Core Team, 2020) and conveys fundamental skills of data literacy and the basics of data science.2 It is suited for beginners and experienced students and contains 3 parts:
First, we introduce key concepts and commands of the R programming language for statistical computing. This includes working with the R Studio environment and writing reproducible research documents with R Markdown.
By working with different forms and types of data, the course provides a basic introduction to data literacy. Although later sessions add some basic elements of computer programming (e.g., writing functions and loops), our main focus is on making sense of data (e.g., by creating summary tables and visualizations).
This book and course are supported by the R package ds4psy (Neth, 2020), which contains datasets and functions used in examples and exercises. Both book and course were originally based on parts of the popular textbook R for Data Science (Wickham & Grolemund, 2017), but the topics and datasets used in the course are geared more specifically to the interests and needs of psychologists and social scientists (involving people-related data, e.g., of patients or experimental participants).
Completing this course enables students to understand, transform, analyze, and visualize data in a variety of ways. While this course does not deal with statistical testing and only scratches the surface of computer programming, it teaches reproducible research practices and covers fundamental knowledge and skills of data science.
Enrolling in this course assumes no prior knowledge in programming or statistics, but motivation for weekly readings and for regularly solving exercises. Grades are determined by submitted solutions to exercises and a final project or exam.
Students of psychology and other social sciences are trained to analyze data. But the data they learn to work with (e.g., in courses on statistics and empirical research methods) is typically provided to them and structured in a — mostly rectangular and often tidy (Wickham, 2014b) — format that includes and presupposes many steps of data processing regarding the aggregation and spatial layout of variables. When beginning to collect data from real sources, most students struggle with these pre-processing steps which — even for experienced data scientists — tend to require more time and effort than choosing and conducting statistical tests. This course develops the foundations of data analysis that allow students to collect data from real-world sources and transform such data into a shape that allows conducting reproducible research and answering scientific questions.
While there are many good introductions to data science — like R for Data Science (Wickham & Grolemund, 2017) — they typically do not cater towards the special background and needs — and often anxieties and reservations — of psychology students. As social scientists are not computer scientists, we introduce new concepts and commands without assuming a mathematical or computational background. Our data and examples typically involve people and questions currently of interest in scientific psychology. Adopting a task-oriented perspective, we begin with a specific problem and then solve it with some combination of data collection, manipulation, modeling, and visualization.
Our main goal is to develop a set of useful skills in analyzing real-world data and conducting reproducible research. Upon completing this course, you will be able to read, transform, analyze, and visualize data of various types. Many interactive exercises allow students to check their understanding, monitor their progress, and practice their skills.
Attending class, continuos preparation (by working through the current chapter before attending class), and solving and submitting weekly programming assignments (formerly abbreviated as WPAs, see Assessment), and final exam OR data science project.
This course assumes some basic familiarity with statistics and the R programming language, but enthusiastic programming novices are also welcome.
A. Submitting solutions to weekly programming assignments (WPAs) on Ilias by Thursday of the same week (by 23:59) on at least 10 out of 12 weeks.
B. Final assessment:
- Data science project: See Appendix C for guidelines and scope. (Final projects can be thesis-related; contents to be discussed with instructor.)3
Final grades are based on course participation (including regular submission of exercises) (A: 33%) and the final exam/project (B: 67%).
Neth, H. (2020). ds4psy: Data science for psychologists. Retrieved from https://CRAN.R-project.org/package=ds4psy
R Core Team. (2020). R: A language and environment for statistical computing. Retrieved from https://www.R-project.org
Tukey, J. W. (1969). Analyzing data: Sanctification or detective work. American Psychologist, 2, 83–91. https://doi.org/10.1037/h0027108
Wickham, H. (2014b). Tidy data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10
Wickham, H. (2019b). tidyverse: Easily install and load the ’tidyverse’. Retrieved from https://CRAN.R-project.org/package=tidyverse
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
During the Covid-19 pandemic, the organisational details of this course at the University of Konstanz are managed via Ilias. However, all course materials remain available here and are free to use for everyone interested.↩
A few years ago, a course like this would first justify its use of R by its availability, flexibility, and increasing popularity. Today, R and the buzzwords data literacy and data science are so popular that we can skip this part. In fact, not using R or not knowing about data science would increasingly call for an explanation.↩
Data science project are to be completed and submitted by 2020-08-01.↩