We conclude each chapter with links to additional resources. In this introduction, these are pointers to the materials and software requirements of this book, as well as related resources on R (R Core Team, 2024) and the tidyverse (Wickham et al., 2019).
Resources related to this book and course at the University of Konstanz, 2023:
- Neth, H. (2023). ds4psy: Data Science for Psychologists.
Social Psychology and Decision Sciences, University of Konstanz, Germany.
Textbook and R package (version 1.0.0, September 15, 2023).
Retrieved from https://bookdown.org/hneth/ds4psy/. doi 10.5281/zenodo.7229812
Working through this book assumes an installation of three types of software programs:
An R engine: The R project for statistical computing is the origin of all things R. A current distribution of R — e.g., R version 4.3.2 (2023-10-31) — for your machine can be downloaded from one if its mirrors.
As the R language, every R package, and every R function, are extensively documented, the best strategy to answer a question is to consult an official source of reference (rather than doing an internet search). While the official references of the R language can initially be intimidating, they are the most authoritative and often the fastest way of finding answers:
Most questions concerning details of R can be settled by reading the R Language Definition that is available from the Help page of any R system.
The details of particular functions are best resolved by studying the function’s documentation. For a function named
foo, its documentation can be shown by evaluating
?foo. Even when some of the documentation may be hard to understand, working through the Examples is usually helpful.
The distinctions between R, R packages, and RStudio are somewhat confusing at first and will be explained in more detail in Chapter 1: Basic R concepts and commands (see Section 1.1.3). At this point, it is good to know that we can interact with R and manage our R packages within the RStudio IDE. Given the large variety of functions and levels, this interface is divided into many sub-windows that can be arranged and expanded in various ways. To get started, we only need to distinguish between the main Editor window (typically located on the top left), the Console (for entering R commands), and a few auxiliary windows that may display outputs (e.g., a Viewer for showing visualizations) and provide information on our current Environment or the Packages available on our computer. A useful window is Help: Although its main page provides mostly links to online materials, any R package contains detailed documentations on and examples of its functions that can be browsed in this window.
Figure 0.4 shows the Posit cheatsheets on the RStudio IDE and illustrates that there are dozens of other functions available. As you get more experienced, you will discover lots of nifty features and shortcuts. Especially foldable sections and keyboard shortcuts (see
Alt + Shift + K for an overview) can make your life in R a lot easier.
But don’t let the abundance of options overwhelm you — I have yet to meet a person who needs or uses all of them.
A useful feature of the RStudio IDE is that collections of files can be combined into projects. For instance, it makes sense to store everything related to this course in a dedicated directory on your hard drive (e.g., in a folder “ds4psy”) and create an RStudio project (also named ds4psy) that uses this directory as its root. An immediate benefit of using projects is that your entire workflow gets more organized.7
R Markdown allows weaving text and code into reproducible research documents. For quick instructions on combining text and code, see Appendix F, or read the more detailed introduction of Chapter 27: R Markdown of the r4ds textbook. Alternatively, just start with one of the following templates:
A typical R Markdown document consists of three distinct parts:
- A header for setting global document options;
- Text that may contain headings, paragraphs, and itemized lists; and
- Code chunks that contain and evaluate R code.
When using R Markdown (typically saved as with the file extension
.Rmd), you can generate various output formats to show and transfer your work. I recommend generating output documents in HTML format (i.e.,
.html files), as they can easily be exchanged and shown on most devices and platforms.
Fortunately, the range of commands required to benefit from R Markdown is very limited. For instance, the commands in the help file Help > Markdown Quick Reference of RStudio provide a good start for creating beautiful and functional documents. Beyond these basics, the R Markdown Cheatsheet — also available in RStudio by selecting Help > Cheatsheets > R Markdown Cheat Sheet — provides a more comprehensive overview of R Markdown functionality and commands:
- Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Sebastopol, Canada: O’Reilly Media, Inc. [Available at http://r4ds.had.co.nz.]
The updated and expanded 2nd edition is:
- Wickham, H., Cetinkaya-Rundel, M. & Grolemund, G. (2023). R for data science: Import, tidy, transform, visualize, and model data (2nd edition). Sebastopol, Canada: O’Reilly Media, Inc. [Available at https://r4ds.hadley.nz/.]
There are many other excellent books (and even more fragmentary and bad books) on data science in R for various audiences. Here are some recommendations for finding additional texts and courses on learning data science or statistics with R:
Bookdown.org is a major catalyst for data science in R, as it provides many great books on various topics at no charge. The archive page contains books on an even wider selection of topics. Due to the grass-roots nature of the site, many books are unfinished and of low quality. However, there are also many excellent ones. Some easy recommendations include:
The Art of Data Science (by Roger D. Peng and Elizabeth Matsui) is a thoughtful introduction to the principles behind data science.
Hands-On Programming with R (by Garrett Grolemund) provides a solid introduction to R.
R does many things beyond statistics. But as R was designed as a programming language for statistics, many textbooks approach R from this angle. Available examples include:
Learning statistics with R (by D.J. Navarro) is an excellent starting point for psychology students wanting to learn more about statistics.
Answering questions with data (by Matthew J.C. Crump et al.) is a free textbook teaching introductory statistics for undergraduates in Psychology (with lots of additional material).
Reproducible statistics for psychologists with R (by Matthew J.C. Crump et al.) is a series of labs/tutorials for a two-semester graduate-level statistics sequence in psychology.
R you Ready for R? (by Wade Roberts) does not teach statistics from scratch, but provides helpful recipes for conducting particular analyses.
Statistical Inference via Data Science (by Chester Ismay and Albert Y. Kim) teaches statistical inference from a tidyverse perspective.
Online information on R is abundant, but can be hard to navigate. Useful starting points include:
Intro2R provides a gentle 3-day introduction to R.
Quick-R (by Robert Kabacoff) is a popular website on R programming that also provides many pointers for using R in statistics.
R-bloggers collects blog posts on R.
The Simply statistics blog (by Rafa Irizarry, Roger Peng, and Jeff Leek) provides insightful and inspiring articles on many data science topics.
The Win vector blog (by John Mount and Nina Zumel) provides noteworthy observations on particular problems and data science in general.
The Learning Machines blog (by Holger K. von Jouanne-Diedrich) contains many readworthy articles on using R for modeling and machine learning.
Towards data science provides background articles on current data science issues.
Other R courses and exercises include:
An introduction to statistical learning (by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani) provides an introduction to statistical learning methods with applications in R (and a corresponding ISLR package).
Computing for the social sciences is a course in Computational Social Science (taught by Benjamin Soltoff) as part of their Masters in computational social science program. The syllabus is more advanced than this course (and its pace much faster). But as the materials are of very high quality, they are a great way to explore additional topics.
fasteR: Fast lane to learning R (by Norman Matloff) for those who seek a quick, painless entree to the world of R.
R-exercises provides categorized sets of exercises to help people developing their R programming skills.
Other helpful links that do not fit into the above categories include:
Posit cheatsheets provide visual summaries of many task domains and packages.
Automatic Help for R provides pointers and tools for teaching and managing R courses.
What they forgot to teach you about R is a book in the making (by Jennifer Bryan and Jim Hester) that provides many practical tips (e.g., regarding R maintenance, file names and paths, and workflow).
[index.Rmd updated on 2024-02-22 14:31:44.153036 by hn.]
See the introductory chapters of R for Data Science (Wickham & Grolemund, 2017) for short, but helpful instructions on organizing your workflow with RStudio — especially the even-numbered chapters basics (Chapter 4), scripts (Chapter 6), and projects (Chapter 8).↩︎