Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid.
(Wasserman, 2004, p. vii/viii)
This book is not on statistics, but covers the foundations of dealing with data in psychology and a variety of other contexts. Although it is possible to study statistics as a mathematical discipline, actually collecting and analyzing data requires some of the methods and skills conveyed here. Originally, I merely started to elaborate upon chapters of R for Data Science (Wickham & Grolemund, 2017). As my students struggled with this text, I began selecting and summarizing essential concepts and commands and adapted the examples and exercises to the needs of students in psychology and social sciences. And as R reaches far beyond the tidyverse, I added base R sections and solutions to complement the perspective embraced by the original source. By now, most chapters here contain a mix of summaries and new materials, and can be studied independently or as parts of a larger curriculum.
Contents and audience
This book allows teaching a variety of introductory courses on data science for students of psychology and related fields. The term data science here is mainly a place holder for “methods, skills, and tools dealing with data.” As this book focuses on the basics (i.e., data types and shapes, and ways of loading, transforming, summarizing, and visualizing data), it does not cover topics like machine learning, modeling, or statistics. This may initially disappoint some students, but the above quote by Wasserman (2004) also applies to these applications: When other introductory texts promise them, they rarely deliver a sound introduction that is suited to novices. And whoever aims to develop statistical models or advance machine learning needs a solid understanding of the basics to successfully venture into new territory.
The chapters of this book can be arranged into a variety of courses. Depending on your goals, student needs, and the length of the teaching period, some of the more specialized topics (e.g., Chapter 9 on Strings of text, Chapter 10 on Dates and times, or Appendix E on regular expressions) can be skipped, or used as the basis of a more advanced curriculum. Similarly, combining the chapters of Part I and Part IV with some appendices (e.g., Appendix D on using colors in R and Appendix F on using R Markdown) provides a general introduction to data literacy and reproducible research in R, without focusing on particular packages or data wrangling tasks. More specialized courses on statistics or modeling could only use Chapter 1 to introduce Basic R concepts and commands.
The book was conceived with psychologists in mind, but its materials and examples hope to engage students and researchers from various backgrounds and invite them to apply computational tools to solve challenging problems in their fields. Hopefully, readers will welcome the numerous summaries, cross-references, and links to further resources, and find the large number of practice tasks and exercises entertaining and enlightening.
As data science is a dynamic field and this text is being revised on a regular basis, it is likely that the current version contains some typos and mistakes.
Please feel free to contact me (as
uni.kn) to report errors, possible improvements, or any other feedback or observations that you are willing to share.
Data science for psychologists (ds4psy) by Hansjörg Neth is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The HTML-version of this book uses Google Analytics to evaluate the popularity of its different chapters. The website does not collect any personal data of individual users.
This book was generated using R version 4.0.5 (2021-03-31) and the following packages:
- base (4.0.5), BayesFactor (0.9.12.4.2), bookdown (0.21), circlize (0.4.12), coda (0.19.4), datasets (4.0.5), devtools (2.4.0), dplyr (1.0.5), ds4psy (0.6.0), forcats (0.5.1), ggplot2 (3.3.3), graphics (4.0.5), grDevices (4.0.5), here (1.0.1), jpeg (0.1.8.1), knitr (1.31), lubridate (1.7.10), magrittr (2.0.1), Matrix (1.3.2), methods (4.0.5), purrr (0.3.4), RColorBrewer (1.1.2), readr (1.4.0), rmarkdown (2.7), stats (4.0.5), stringr (1.4.0), tibble (3.1.0), tidyr (1.1.3), tidyverse (1.3.0), unikn (0.4.0), usethis (2.0.1), utils (4.0.5), viridis (0.5.1), viridisLite (0.3.0), yarrr (0.1.5).
Thanks to all package authors and the R community for making this book possible!
[85_coda.Rmd updated on 2021-06-15 15:19:01 by hn.]