Modern R with the tidyverse

by Bruno Rodrigues


This book will teach you how to use R to solve you statistical, data science and machine learning problems. Importing data, computing descriptive statistics, running regressions (or more complex machine learning models) and generating reports are some of the topics covered. No previous experience with R is needed. […] This book is still being written. Chapters 1 to 6 are almost ready. Chapter 7 is outdated, but the key messages are still useful. Chapters 8 and 9 are quite complete too. 10 and 11 are empty for now. Some exercises might be at the wrong place too. If you already like what you … Read more →



by 王敏杰


Scientific Research information service using R […] 在图书馆开设R语言系列讲座也有一年半载了,在此过程中我萌生了用R语言写一本书的想法,一方面是想为学生提供R语言学习范例,另一方面也借此为我校科研人员提供一些科研信息服务。如果此举能做到教学相长,更好地实践和应用数据科学,也算是一次很有意义的尝试,无奈自己时间精力有限,写书进展缓慢。 这本书是这样的, 第 1 章简单介绍数据科学与R语言, 第 2 章引入科研信息数据集,并利用tidyverse宏包进行数理统计和数据可视化, 第 3 章统计科研论文中通讯地址使用情况,并给出写作的规范建议, 第 4 章介绍了各学院对ESI学科的贡献,以及期刊对引文的贡献, 第 5 章基于中科院JCR期刊分区分析我校科研人员的选刊倾向, 第 6 … Read more →


Statistical Rethinking with brms, ggplot2, and the tidyverse

by A Solomon Kurz


This project is an attempt to re-express the code in McElreath’s textbook. His models are re-fit in brms, plots are redone with ggplot2, and the general data wrangling code predominantly follows the tidyverse style. […] I love McElreath’s Statistical Rethinking text. It’s the entry-level textbook for applied researchers I spent a couple years looking for. McElreath’s freely-available lectures on the book are really great, too. However, I’ve come to prefer using Bürkner’s brms package when doing Bayeisn regression in R. It’s just spectacular. I also prefer plotting with Wickham’s ggplot2, … Read more →


Data Processing & Visualization

by Michael Clark

Data Processing & Visualization

The focus of this document is on common data processing and exploration techniques in R, especially as a prelude to visualization. The first part of the document will cover data structures, the dplyr and tidyverse packages, which enhance and facilitate the sorts of operations that typically arise when dealing with data, including faster I/O and grouped operations. For visualization, the focus will be on using ggplot2 and other packages that allow for interactivity. In addition, basic programming concepts and techniques are introduced. Exercises may be found in the document as well. In addition, the demonstrations of the data processing section are available in Python via Jupyter notebooks. Read more →


recoding Introduction to Mediation, Moderation, and Conditional Process Analysis

by A Solomon Kurz


This project is an effort to connect his Hayes’s conditional process analysis work with the Bayesian paradigm. Herein I refit his models with my favorite R package for Bayesian regression, Bürkner’s brms. I use syntax based on sensibilities from the tidyverse and plot with Wickham’s ggplot2. […] Andrew Hayes’s Introduction to Mediation, Moderation, and Conditional Process Analysis text, the second edition of which just came out, has become a staple in social science graduate education. Both editions of his text have been from a frequentist OLS perspective. This project is an effort to … Read more →


An Incomplete Solutions Guide to the NIST/SEMATECH e-Handbook of Statistical Methods

by Ray Hoobler


Analysis of case studies and exercies with a focus on using the tidyverse and ggplot2. This handbook was created using the bookdown package in RStudio. The output format for this example is bookdown::gitbook. […] Exploratory Data Analysis (EDA) is a philosophy on how to work with data, and for many applications, the workflow is better suited for most working scientist and engineers. As a scientist, we are trained to formulate a hypothesis and design a series of experiments that will allow us to test the hypothesis effectively. Unfortunately, most data doesn’t from carefully controlled … Read more →


Simulation And The James-Stein Estimator In R

by Alex Hallam


Simple Simulation and the James-Stein Estimator […] This is the website for “Simulation And The James-Stein Estimator In R”. This technical document is short, covering some common ways to generate data and exploring the James-Stein Estimator. This will teach you how to do run simulations to observe the properties of the James-Stein Estimator in R — specifically using the tidyverse: You’ll learn how to generate data to prove theoretical results. In the computer age of statistics the data scientist has the power of machines to run simulations for testing a methods before putting a method into … Read more →


Data Science and Visualizations with R

by Jonathan Wong


Data Science and Visualizations with R […] This is a course on the use of tidyverse packages tidyverse provides a complete suite of modern data-handling tools. It is an essential toolbox for any data scientist using R. The tidyverse package is designed to be easy to install. This course will dive into using tidyverse. It will assume you have already installed r and rstudio and how some familiarity on how to use the rstudio. This book will use the nycflights13 dataset This package contains information about all flights that departed from NYC in 2013: 336,776 flights with 16 variables. To … Read more →


Tidyverse Cookbook

by Malte Grosser


Simple cookbook for functions and idioms within the scope of the tidyverse. […] The basic idea of this book is to provide a documentation of the tidyverse written in a solution driven cookbook style. As an extra I would like to provide similar solutions based on base R functionality. Some reasons to write this book: One strength of the tidyverse is that it hides a lot of quirks that base R provides and inherits to many packages that rely on it. This allows to stick to a specific workflow from the point you enter the tidyverse until you leave it. This is why I highly recommend to head your … Read more →


Spreadsheet Munging Strategies

by Duncan Garmonsway


Spreadsheet Munging Strategies […] This is a work-in-progress book about getting data out of spreadsheets, no matter how peculiar. The book is designed primarily for R users who have to extract data from spreadsheets and who are already familiar with the tidyverse. It has a cookbook structure, and can be used as a reference, but readers who begin in the middle might have to work backwards from time to time. R packages that feature heavily are Tidyxl and unpivotr are much more complicated than readxl, and that’s the point. Tidyxl and unpivotr give you more power and complexity when you need … Read more →