Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid.
(Wasserman, 2004, p. vii/viii)
This book neither covers machine learning nor does it teach statistics. Instead, it provides a foundation for dealing with data in scientific psychology and related contexts. While it is possible to study statistics as a mathematical discipline, actually using data for answering real-world questions (i.e., collecting, transforming, and evaluating data) requires all of the methods and skills introduced here.
When first starting to teach this material (in 2016), I elaborated upon chapters of R for Data Science (Wickham & Grolemund, 2017). As most of my students have no programming background, they struggled with this text and were confused by the mix of base R and tidyverse concepts. Hence, I began selecting and summarizing essential topics and tailored my examples and exercises to the needs of students in psychology and social sciences. To provide some foundations beyond the tidyverse, I added base R sections to complement the perspective embraced by my original source. By now, most chapters here contain a mix of summaries and new materials, and can be studied independently or as parts of a larger curriculum.
This book allows teaching a variety of introductory courses on data science for students of psychology and related fields. The term data science here is mainly a place holder for “methods, skills, and tools dealing with data”. As this book focuses on the basics (i.e., data types and shapes, and ways of loading, transforming, summarizing, and visualizing data), it does not cover topics like machine learning, modeling, or statistics. This may initially disappoint some students, but the above quote by Wasserman (2004) also applies to these applications: When other introductory texts promise them, they rarely deliver a sound introduction that is suited to novices. And whoever aims to develop statistical models or advance machine learning first must possess a solid understanding of the basics to successfully venture into new territory.
The chapters of this book can be arranged into a variety of courses. Depending on your goals, student needs, and the length of the teaching period, some of the more specialized topics (e.g., Chapter 9 on Strings of text, Chapter 10 on Dates and times, or Appendix E on regular expressions) can be skipped, or used as the basis of a more advanced curriculum. Similarly, combining the chapters of Part I and Part IV with some appendices (e.g., Appendix D on using colors in R and Appendix F on using R Markdown) provides a general introduction to data literacy and reproducible research in R, without focusing on particular packages or data wrangling tasks. More specialized courses on statistics or modeling could only use Chapter 1 to introduce Basic R concepts and commands.
The book was conceived with psychologists in mind, but its materials and examples hope to engage students and researchers from various backgrounds and invite them to apply computational tools to solve challenging problems in their fields. Hopefully, readers will welcome the numerous summaries, cross-references, and links to further resources, and find the large number of practice tasks and exercises entertaining and enlightening.
As data science is a dynamic field and this text is being revised on a regular basis, it is likely that the current version contains some typos and mistakes.
Please feel free to contact me (as
uni.kn) to report errors, possible improvements, or any other feedback or observations that you are willing to share.
Data science for psychologists (ds4psy) by Hansjörg Neth is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The HTML-version of this book uses Google Analytics to evaluate the popularity of its different chapters. The website does not collect any personal data of individual users.
This book was generated using R version 4.3.1 (2023-06-16) and the following packages:
- bardr (0.0.9), base (4.3.1), BayesFactor (0.9.12.4.5), bookdown (0.36), circlize (0.4.15), coda (0.19.4), datasets (4.3.1), devtools (2.4.5), dplyr (1.1.4), ds4psy (1.0.0), forcats (1.0.0), ggplot2 (3.4.4), graphics (4.3.1), grDevices (4.3.1), here (1.0.1), jpeg (0.1.10), knitr (1.45), lubridate (1.9.3), magrittr (2.0.3), Matrix (1.6.3), methods (4.3.1), purrr (1.0.2), RColorBrewer (1.1.3), readr (2.1.4), rmarkdown (2.25), stats (4.3.1), stringr (1.5.1), tibble (3.2.1), tidyr (1.3.0), tidyverse (2.0.0), timechange (0.2.0), unicol (0.2.0.9012), unikn (0.9.0.9006), usethis (2.2.2), utils (4.3.1), viridis (0.6.4), viridisLite (0.4.2), yarrr (0.1.5).
Thanks to all package authors and the R community for making this book possible!
[85_coda.Rmd updated on 2023-11-27 13:19:23.194929 by hn.]