About

About ds4psy

Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid.

(Wasserman, 2004, p. vii/viii)

This book neither covers machine learning nor does it teach statistics. Instead, it provides a foundation for dealing with data in scientific psychology and related contexts. While it is possible to study statistics as a mathematical discipline, actually using data for answering real-world questions (i.e., collecting, transforming, and evaluating data) requires all of the methods and skills introduced here.

When first starting to teach this material (in 2016), I elaborated upon chapters of R for Data Science (Wickham & Grolemund, 2017). As most of my students have no programming background, they struggled with this text and were confused by the mix of base R and tidyverse concepts. Hence, I began selecting and summarizing essential topics and tailored my examples and exercises to the needs of students in psychology and social sciences. To provide some foundations beyond the tidyverse, I added base R sections to complement the perspective embraced by my original source. By now, most chapters here contain a mix of summaries and new materials, and can be studied independently or as parts of a larger curriculum.

Audience

The book was conceived with psychologists in mind, but its materials and examples hope to engage students and researchers from various backgrounds and invite them to apply computational tools to solve challenging problems in their fields. Hopefully, readers will welcome the numerous summaries, cross-references, and links to further resources, and find the large number of practice tasks and exercises entertaining and enlightening.

Contents and curricula

This book allows teaching a variety of introductory courses on data science for students of psychology and related fields. The term data science here is mainly a place holder for “methods, skills, and tools dealing with data”. As this book focuses on the basics (i.e., data types and shapes, and ways of loading, transforming, summarizing, and visualizing data), it does not cover topics like machine learning, modeling, or statistics. This may initially disappoint some students, but the above quote by Wasserman (2004) also applies to these applications: When other introductory texts promise them, they rarely deliver a sound introduction that is suited to novices. And whoever aims to develop statistical models or advance machine learning first must possess a solid understanding of the basics to successfully venture into new territory.

The chapters of this book can be arranged into a variety of courses. Depending on your goals, student needs, and the length of the teaching period, some of the more specialized topics (e.g., Chapter 9 on Strings of text, Chapter 10 on Dates and times, or Appendix E on regular expressions) can be skipped, or used as the basis of a more advanced curriculum. Similarly, combining the chapters of Part I and Part IV with some appendices (e.g., Appendix D on using colors in R and Appendix F on using R Markdown) provides a general introduction to data literacy and reproducible research in R, without focusing on particular packages or data wrangling tasks. More specialized courses on statistics or modeling could only use Chapter 1 to introduce Basic R concepts and commands.

More advanced curricula could supplement topics from this book by materials from Introduction to Data Science (Neth, 2025). See, for instance, its chapters on Modeling or Shiny applications), and its Appendix B on Developing R packages.

Providing feedback

As data science is a dynamic field and this text is being revised on a regular basis, it is likely that the current version contains some typos and mistakes. Please feel free to contact me (as h.neth at uni.kn) to report errors, possible improvements, or any other feedback or observations that you are willing to share.

Linking and citing

Feel free to adopt this book or parts of it to your own purposes, but please acknowledge its use in your own work.

To cite this book or the corresponding R package ds4psy (Neth, 2023) in derivations or publications, please use:

Neth, H. (2023). ds4psy: Data Science for Psychologists.
Social Psychology and Decision Sciences, University of Konstanz, Germany.
Textbook and R package (version 1.0.0, September 15, 2023).
Retrieved from https://bookdown.org/hneth/ds4psy/. doi 10.5281/zenodo.7229812

A BibTeX entry for LaTeX users is:

@Manual{,
  title = {ds4psy: Data Science for Psychologists},
  author = {Hansjörg Neth},
  year = {2023},
  organization = {Social Psychology and Decision Sciences, University of Konstanz},
  address = {Konstanz, Germany},
  note = {R package (version 1.0.0, September 15, 2023); Textbook at <https://bookdown.org/hneth/ds4psy/>.},
  url = {https://CRAN.R-project.org/package=ds4psy},
  doi = {10.5281/zenodo.7229812},
}

Online links:

The URL of the R package ds4psy is https://CRAN.R-project.org/package=ds4psy.
As the structure of the book’s chapters and sections may change, links should only use the base URL https://bookdown.org/hneth/ds4psy/.

License

Data science for psychologists (ds4psy) by Hansjörg Neth is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The HTML-version of this book uses Google Analytics to evaluate the popularity of its different chapters. The website does not collect any personal data of individual users.

Colophon

This book was generated using R version 4.5.0 (2025-04-11) and the following packages:

bardr (0.0.9), base (4.5.0), BayesFactor (0.9.12.4.7), bookdown (0.43), circlize (0.4.16), coda (0.19.4.1), datasets (4.5.0), devtools (2.4.5), dplyr (1.1.4), ds4psy (1.0.0.9018), forcats (1.0.0), ggplot2 (3.5.2), graphics (4.5.0), grDevices (4.5.0), here (1.0.1), jpeg (0.1.11), knitr (1.50), lubridate (1.9.4), magrittr (2.0.3), Matrix (1.7.3), methods (4.5.0), purrr (1.0.4), RColorBrewer (1.1.3), readr (2.1.5), rmarkdown (2.29), stats (4.5.0), stringr (1.5.1), tibble (3.3.0), tidyr (1.3.1), tidyverse (2.0.0), timechange (0.3.0), unicol (0.3.0.9004), unikn (1.0.0.9002), usethis (3.1.0), utils (4.5.0), viridis (0.6.5), viridisLite (0.4.2), yarrr (0.1.5).

Thanks to all package authors and the R community for making this book possible!

ds4psy

[85_coda.Rmd updated on 2025-07-11 by hn.]

References

Neth, H. (2023). ds4psy: Data science for psychologists. https://doi.org/10.5281/zenodo.7229812

Neth, H. (2025). Introduction to data science. Retrieved from https://bookdown.org/hneth/i2ds/

Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer.

Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz