# Data Science

# Modern R with the tidyverse

## by Bruno Rodrigues

This book will teach you how to use R to solve you statistical, data science and machine learning problems. Importing data, computing descriptive statistics, running regressions (or more complex machine learning models) and generating reports are some of the topics covered. No previous experience with R is needed. […] This book is still being written. Chapters 1 to 6 are almost ready. Chapter 7 is outdated, but the key messages are still useful. Chapters 8 and 9 are quite complete too. 10 and 11 are empty for now. Some exercises might be at the wrong place too. If you already like what you … Read more →

# ntpu-data-visualization.utf8.md

## by tpemartin

經濟資料視覺化處理 […] This course is designed to develop the skill of efficient graphic language, where efficiency is defined as the data information delivery that is self-contained, concise, and non-distorting. The programming language is mainly based on R, with a little bit of Javascript toward the end. Though there is no computer programming knowledge required, basic R knowledge will help (the ebook, R for Data Science, would be a good start). By the end of the course, students who learn well should be able to … Read more →

# R for Data Science Solutions

## by Jeffrey B. Arnold

This contains the solutions to the exercises in the book, R for Data Science, by Garrett Grolemund and Hadley Wickham. […] This contains solutions to the exercise in R for Data Science, byn Hadley Wickham and Garret Grolemund (Wickham and Grolemund 2017). The website for that book is r4ds.had.co.nz, and a physical copy is published by O’Reilly and available from amazon. This work is licensed under a Creative Commons Attribution 4.0 International License Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 1st ed. O’Reilly … Read more →

# A First Course on Statistical Inference

## by Isabel Molina Peralta and Eduardo García Portugués

Notes for Statistical Inference. MSc in Statistics for Data Science. Carlos III University of Madrid. […] Definition 1.1 (Random experiment) A random experiment is an experiment with the following properties: The following concepts are associated with a random experiment: Example 1.1 The next experiments are random experiments: A probability function is defined as a mapping of subsets (events) of the sample space (\Omega) to elements in ([0,1]). Therefore, it is convenient to count on a “good” structure for these subsets, which will provide “good” properties to the probability … Read more →

# Introduction to Data Science

## by Rafael A. Irizarry

This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning and helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with GitHub, and reproducible document preparation with R markdown. Read more →

# Notes for ST463/ST683 Linear Models 1

## by Katarina Domijan, Catherine Hurley

These are the notes for ST463/ST683 Linear Models 1 course offered by the Mathematics and Statistics Department at Maynooth University. This module is offered at as a part of of MSc in Data Science and Data Analytics. It is an introductory course for students who have basic background in Statistics, Data analysis, R Programming and linear algebra (matrices). […] There are many good resources, e.g. Weisberg (2005), Fox (2005), Fox (2016), Ramsey and Schafer (2002), Draper and Smith (1966). We will use Minitab and R (R Core Team 2017). To create this document, I am using the bookdown package … Read more →

# Population Health Data Science with R

## by Tomás J. Aragón

Population health data science (PHDS) is the art and science of transforming data into actionable knowledge to improve health. R is an open source programming environment for statistical computing and graphics. PHDS is captured by four words: describe, predict, discover, and advise. […] We are writing this book to introduce R—a programming language and environment for statistical computing and graphics—to public health epidemiologists, health care data analysts, data scientists, statisticans, and others conducting population health analyses. Recent graduates come prepared with a solid … Read more →

# STAT160 R/RStudio Companion

## by Statistics/Data Science at St. John Fisher College

Companion document to Introduction to Statistical Investigations using R/RStudio. […] This companion is for use in STAT160 (Introduction to Data Science). The textbook for the course is Introduction to Statistical Investigations (Tintle et. al). Through in-class and home work assignments, students will learn to use R and RStudio. In this companion, we will review the commands and functions students will need to perform statistical analysis and generate statistical … Read more →

# Data Science at the Command Line

## by Jeroen Janssens

This is the website for Data Science at the Command Line, published by O’Reilly October 2014 First Edition. This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, macOS, or Linux—author Jeroen Janssens has developed a Docker image packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible … Read more →

# Data Science con R: Fundamentos y Aplicaciones

## by BEST: Behavioral Economics & Data Science Team

El mejor libro en español de ciencia de datos, libre y abierto. […] Entender el comportamiento de las personas o de la sociedad, es un tema fascinante. Hace algunos siglos atrás, el profeta Isaías escribió: La economía conductual toca este tema, desde una perspectiva cientìfica, apoyado de la psicología y economía. Remember each Rmd file contains one and only one chapter, and a chapter is defined by the first-level heading #. To compile this example to PDF, you need XeLaTeX. You are recommended to install TinyTeX (which includes XeLaTeX): https://yihui.name/tinytex/. … Read more →

# An Introduction to Statistical and Data Sciences via R

## by Chester Ismay and Albert Y. Kim

An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses. […] Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section 1.5. This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding … Read more →

# Meu log de leitura de R for Data Science

## by Marcos V. C. Vital - LEQ-UFAL

Meu log de leitura de R for Data Science […] Se tem alguma pessoa que pode ser considerada um “pop star” do R, seria o Hadley Wickham: o cara é responsável pelo ggplot2 e pelo dplyr, que são alguns dos pacotes mais populares do R! Mas são justamente pacotes que eu quase não uso… :( Deixe eu explicar melhor. Eu sou usuário do R há muitos anos (fiz as contas de cabeça enquanto eu escrevo, e se não me enganei, agora em 2018 seriam uns 13 ou 14 anos!), então já tem um bocado de tempo que aprendi a como resolver (e ensinar) algumas coisas. Até aí tudo bem. Acontece que o Hadley trouxe uma … Read more →

# Math 390.4: Data Science with R

## by The Queens College Collective Consciousness

This is a minimal example of using the bookdown package to write a book. The output format for this example is bookdown::gitbook. […] This is a book published using the R Markdown language. R Markdown supports Latex, so you can make pretty equations like Professor Kapelner likes: (a^2 + b^2 = c^2). To type inline latex, just surround your code with dollar signs. That was published like this: $a^2 + b^2 = c^2$ You can edit the markdown for this book from RStudio just like you would edit a regular R Markdown (.Rmd) file. Here’s a picture of what it looks like as I edit this book and the R … Read more →

# Selected Solutions to R4DS Exercises

## by Chunji Wang

This book provides selected solutions to the exercises in the wonderful book R for Data Science by Wickham Hadley. […] This is the website for “Selected Solutions to R4DS Exercises”. This is a joint advanture between Chunji Wang, Ron, Luna, Zhiyin, Chengcheng…. We started the “R4DS Study Club” on Sep 22nd, 2017; If you want to join us, please contact us! The chapter labels in this book is the same as the original R4DS book; go to the corresponding chapter for solutions. You might need to read the beginning of the chapter to load some packages or create some variables that are … Read more →

# Mastering Software Development in R

## by Roger D. Peng, Sean Kross, and Brooke Anderson

The book covers R software development for building data science tools. As the field of data science evolves, it has become clear that software development skills are essential for producing useful data science results and products. You will obtain rigorous training in the R language, including the skills for handling complex data, building R packages and developing custom data visualizations. You will learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers. Read more →

# Course Notes for IS 6489, Statistics and Predictive Analytics

## by Jeff Webb

Course notes for IS 6489. […] These are the course notes for IS 6489, Statistics and Predictive Analytics, offered through the Information Systems (IS) department in the University of Utah’s David Eccles School of Business. This is an exciting time for data analysis! The field has undergone a revolution in the last 15 years with increases in computing power and the availability of “big data” from web-based systems of data collection. “Data science” is the umbrella term that describes the result of this revolution—a new discipline at the intersection of many traditional fields such as … Read more →

# ModernDive

## by Chester Ismay and Albert Y. Kim STARRING FRANK MCGRADE

An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses. […] Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. This is version 0.2.0 of ModernDive published on August 02, 2017. For previous versions of ModernDive, see Section 1.4. This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding … Read more →

# Data Science in Educational Research

## by Joshua M. Rosenberg

This is an introduction and tutorial for data science in educational research. … Read more →

# Data Science and Visualizations with R

## by Jonathan Wong

Data Science and Visualizations with R […] This is a course on the use of tidyverse packages tidyverse provides a complete suite of modern data-handling tools. It is an essential toolbox for any data scientist using R. The tidyverse package is designed to be easy to install. This course will dive into using tidyverse. It will assume you have already installed r and rstudio and how some familiarity on how to use the rstudio. This book will use the nycflights13 dataset This package contains information about all flights that departed from NYC in 2013: 336,776 flights with 16 variables. To … Read more →

# The Art of Data Science

## by Roger D. Peng and Elizabeth Matsui

The book covers R software development for building data science tools. As the field of data science evolves, it has become clear that software development skills are essential for producing useful data science results and products. You will obtain rigorous training in the R language, including the skills for handling complex data, building R packages and developing custom data visualizations. You will learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers. Read more →

# R Programming for Data Science

## by Roger D. Peng

The R programming language has become the de facto programming language for data science. Its flexibility, power, sophistication, and expressiveness have made it an invaluable tool for data scientists around the world. This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code. With the fundamentals provided in this book, you will have a solid foundation on which to build your data science toolbox. Read more →

# A list of R conferences and meetings

## by csgillespie

A list of R conferences and meetings. […] This site attempts to list R conferences and local useR groups. Please feel free to add any missing group or conference. In particular, most of the associated twitter names are missing. There are currently 263 R user groups and events. To propose a change, just click the pencil icon in the top left hand corner. We also maintain a corresponding list of Data Science conferences and events. The html files for this document live in the docs/ directory of the repository. Travis creates the html files from the .Rmd files and commits them to the docs/ … Read more →

# Scalable Machine Learning and Data Science with Microsoft R Server and Spark

## by Ali Zaidi, Machine Learning and Data Science, Microsoft

These are (tentatively) rough notes showcasing some tips on conducting large scale data analysis with R, Spark, and Microsoft R Server. The focus is primarily on machine learning with Azure HDInsight platform, but review other in-memory, large-scale data analysis platforms, such as R Services with SQL Server 2016, and discuss how to utilize BI tools such as PowerBI and Shiny for dynamic reporting, and report generation. Read more →

# Data Science Live Book

## by Pablo Casas

An intuitive and practical approach to data analysis, data preparation and machine learning, suitable for all ages! […] This book is now available at Amazon. Check it out! 📗 🚀. Link to the black & white version, also available on full-color. It can be shipped to over 100 countries. 🌎 The book will facilitate the understanding of common issues when data analysis and machine learning are done. Building a predictive model is as difficult as one line of R code: That’s it. But, data has its dirtiness in practice. We need to sculp it, just like an artist does, to expose its information in order … Read more →

# Hands-On Programming with R

## by Garrett Grolemund

This book will teach you how to program in R, with hands-on examples. I wrote it for non-programmers to provide a friendly introduction to the R language. You’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools. Throughout the book, you’ll use your newfound skills to solve practical data science problems. Read more →

# R for Data Science

## by Garrett Grolemund, Hadley Wickham

This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data. Read more →