Introduction to Data Science

by Rafael A. Irizarry


This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning and helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with GitHub, and reproducible document preparation with R markdown. Read more →


Statistical Rethinking with brms, ggplot2, and the tidyverse

by A Solomon Kurz


This project is an attempt to re-express the code in McElreath’s textbook. His models are re-fit in brms, plots are redone with ggplot2, and the general data wrangling code predominantly follows the tidyverse style. […] I love McElreath’s Statistical Rethinking text. It’s the entry-level textbook for applied researchers I spent a couple years looking for. McElreath’s freely-available lectures on the book are really great, too. However, I’ve come to prefer using Bürkner’s brms package when doing Bayeisn regression in R. It’s just spectacular. I also prefer plotting with Wickham’s ggplot2, … Read more →


Data Processing & Visualization

by Michael Clark

Data Processing & Visualization

The focus of this document is on common data processing and exploration techniques in R, especially as a prelude to visualization. The first part of the document will cover data structures, the dplyr and tidyverse packages, which enhance and facilitate the sorts of operations that typically arise when dealing with data, including faster I/O and grouped operations. For visualization, the focus will be on using ggplot2 and other packages that allow for interactivity. In addition, basic programming concepts and techniques are introduced. Exercises may be found in the document as well. In addition, the demonstrations of the data processing section are available in Python via Jupyter notebooks. Read more →


Data Visualization with R

by Rob Kabacoff

Data Visualization with R

A guide to creating modern data visualizations with R. Starting with data preparation, topics include how to create effective univariate, bivariate, and multivariate graphs. In addition specialized graphs including geographic maps, the display of change over time, flow diagrams, interactive graphs, and graphs that help with the interpret statistical models are included. Focus is on the 45 most popular graph types. The guide also includes detailed instructions on how to customizing graphs, and ends with a chapter on graphing best practices. Although strongly based on the ggplot2 package, other approaches are included as well. Read more →


recoding Introduction to Mediation, Moderation, and Conditional Process Analysis

by A Solomon Kurz


This project is an effort to connect his Hayes’s conditional process analysis work with the Bayesian paradigm. Herein I refit his models with my favorite R package for Bayesian regression, Bürkner’s brms. I use syntax based on sensibilities from the tidyverse and plot with Wickham’s ggplot2. […] Andrew Hayes’s Introduction to Mediation, Moderation, and Conditional Process Analysis text, the second edition of which just came out, has become a staple in social science graduate education. Both editions of his text have been from a frequentist OLS perspective. This project is an effort to … Read more →


An Incomplete Solutions Guide to the NIST/SEMATECH e-Handbook of Statistical Methods

by Ray Hoobler


Analysis of case studies and exercies with a focus on using the tidyverse and ggplot2. This handbook was created using the bookdown package in RStudio. The output format for this example is bookdown::gitbook. […] Exploratory Data Analysis (EDA) is a philosophy on how to work with data, and for many applications, the workflow is better suited for most working scientist and engineers. As a scientist, we are trained to formulate a hypothesis and design a series of experiments that will allow us to test the hypothesis effectively. Unfortunately, most data doesn’t from carefully controlled … Read more →


Meu log de leitura de R for Data Science

by Marcos V. C. Vital - LEQ-UFAL


Meu log de leitura de R for Data Science […] Se tem alguma pessoa que pode ser considerada um “pop star” do R, seria o Hadley Wickham: o cara é responsável pelo ggplot2 e pelo dplyr, que são alguns dos pacotes mais populares do R! Mas são justamente pacotes que eu quase não uso… :( Deixe eu explicar melhor. Eu sou usuário do R há muitos anos (fiz as contas de cabeça enquanto eu escrevo, e se não me enganei, agora em 2018 seriam uns 13 ou 14 anos!), então já tem um bocado de tempo que aprendi a como resolver (e ensinar) algumas coisas. Até aí tudo bem. Acontece que o Hadley trouxe uma … Read more →


ggplot2 介紹

by 林茂廷老師


ggplot2 介紹 […] 要在hypothes.is貼上程式碼時,請依下例張貼: ggplot2 cheatsheet Computing for the Social Sciences, U.Chicago. ggplot2part of the … Read more →



by @kazutan


これはggplot2逆引き集です。 […] これはQiitaで公開されているggplot2逆引きの記事を集めたものです。今のところ,@kazutanが作成した12本をまとめています。 なにかありましたら,以下のGithubリポジトリのissueもしくはTwitterの@kazutanまでおねがいします。 … Read more →


plotly for R

by Carson Sievert


An overview of the R package plotly […] This website explains and partially documents the R package plotly, a high-level interface to the open source JavaScript graphing library plotly.js (which powers The R package already has numerous examples and documentation on and, but this website provides more of a cohesive narrative to help explain fundamental concepts and recent developments. By reading from start to finish, readers new to R and plotly should be able to get up and running fairly quickly. That being said, advanced R and plotly … Read more →