Data Science
Linear Algebra
by Bakti Siregar, M.Sc., CDS
Linear Algebra is a branch of mathematics that plays a fundamental role in various fields, ranging from physics and engineering to economics and computer science. In recent decades, the advancements in technology and data science have further emphasized the importance of Linear Algebra, particularly in finance, business, and machine learning. This book is designed to bridge the understanding of the basic theories of Linear Algebra with its applications in modern contexts, where data analysis and decision optimization are increasingly essential for strategic decision-making. In the world of … Read more →
Clinical Biostatistics
by Leonhard Held, with contributions from Charlotte Micheloud, Lisa Hofer, Stefanie von Felten, Samuel Pawel
Based on the lecture notes from STA404: Clinical Biostatistics. […] “Medicine is a science of uncertainty and an art of probability.” William Osler (1849-1919). Biostatistics is a fundamental discipline at the core of modern health data science (Lee et al, 2019). It is the science of managing medical uncertainty and biostatistical methods play a key role in the scientific assessment of the main areas of medical practice: Leonhard Held … Read more →
Geocomputation with R
by Robin Lovelace, Jakub Nowosad, Jannes Muenchow
Welcome | Geocomputation with R is for people who want to analyze, visualize and model geographic data with open source software. It is based on R, a statistical programming language that has powerful data processing, visualization, and geospatial capabilities. The book equips you with the knowledge and skills to tackle a wide range of issues manifested in geographic data, including those with scientific, societal, and environmental implications. This book will interest people from many backgrounds, especially Geographic Information Systems (GIS) users interested in applying their domain-specific knowledge in a powerful open source language for data science, and R users interested in extending their skills to handle spatial data. Read more →
Introduction to Data Science
by Hansjörg Neth
This book provides a gentle introduction to data science for students of any discipline with little or no background in data analysis or computer programming. Based on notions of representation, measurement, and modeling, we examine key data types (e.g., logicals, numbers, text) and learn to clean, summarize, transform, and visualize (rectangular) data. By reflecting on the relations between representations, tasks, and tools, the course promotes data literacy and cultivates reproducible research practices that precede and enable practical uses of programming or statistics. This book is still being written and revised. It currently serves as a scaffold for a curriculum that will be filled with content as we go along. Read more →
dbt and BigQuery: an action oriented approach
by Samuel Gachuhi Ngugi
This book is your manual to using dbt with BigQuery. […] Samuel Gachuhi is a geographer who by fate found himself in the programming world. He holds a certificate in data science and machine learning and another in deep learning with Tensorflow from Udemy. He was motivated to write this book on dbt after noticing that most text on dbt was written in a manner only comprehensible to software engineers. Believing that knowledge transfer should be conveyed in a manner that is understandable by all, he sought to write this book in a less technical manner, and infusing it with humour since … Read more →
Producing and Using Data in Cognitive Science
by Daniel Nettle
Producing and Using Data in Cognitive Science […] Welcome to the course ‘Producing and using data in Cognitive Science’. This course covers what would traditionally be included in a statistics course, plus some of a research methods course, plus some of a data science course. The basic idea is the following: to answer scientific questions in cognitive science, we have to: produce the right data, the data that have the best chance of answering our question; handle those data right, organizing them, manipulating them in ways that are transparent, storing them permanently and making them … Read more →
A Guide on Data Analysis
by Mike Nguyen
This is a guide on how to conduct data analysis in the field of data science, statistics, or machine learning. […] The intended audience includes those with little to no experience in statistics, econometrics, or data science, as well as individuals with a budding interest in these fields who are eager to deepen their knowledge. While my primary domain of interest is marketing, the principles and methods discussed in this book are universally applicable to any discipline that employs scientific methods or data analysis. I hope this book provides a valuable starting point for aspiring … Read more →
Advanced Statistical Modelling
by Dr. S. Jackson
These are the course notes for the Machine Learning module of Durham University’s Masters of Data Science course. […] Welcome to the material for the first term of the module Advanced Statistical Modelling MATH3411 at Durham University. These pages will update as the course progresses, consisting of relevant lecture notes, practical demonstrations (in R), exercise sheets and practical sessions. I would recommend that you use the html version of these notes (they have been designed for use in this way), however, there is also a pdf version of these notes, which will also be updated as the … Read more →
Data Science for Psychologists
by Hansjörg Neth
This book provides an introduction to data science that is tailored to the needs of students in psychology, but is also suitable for students of the humanities and other biological or social sciences. This audience typically has some knowledge of statistics, but rarely an idea how data is prepared for statistical testing. By using various data types and working with many examples, we teach strategies and tools for reshaping, summarizing, and visualizing data. By keeping our eyes open for the perils of misleading representations, the book fosters fundamental skills of data literacy and cultivates reproducible research practices that enable and precede any practical use of statistics. Read more →
Environmental Data Science Addenda
by Jerry Davis, SFSU Institute for Geographic Information Science
Addenda to Introduction to Environmental Data Science, including case studies, extending methods into more experimental areas, and guides for building packages and RMarkdown documents (essentials based on experience and methods from various sources). […] The purpose of this bookdown book is to provide Addenda to Introduction to Environmental Data Science, at https://bookdown.org/igisc/EnvDataSci/ to include case studies, extended and experimental methods, and guides for building packages and RMarkdown documents. It’ll serve as a sandbox for exploring methods, some of which will make it into … Read more →
Introduction to Environmental Data Science
by Jerry Davis, SFSU Institute for Geographic Information Science
Background, methods and exercises for using R for environmental data science. The focus is on applying the R language and various libraries for data abstraction, transformation, data analysis, spatial data/mapping, statistical modeling, and time series, applied to environmental research. Applies exploratory data analysis methods and tidyverse approaches in R, and includes contributed chapters presenting research applications, with associated data and code packages. Read more →
STA 444/5 - Introductory Data Science using R
by Dr. Robert Buscaglia
STA 444/5 - Introductory Data Science using R […] This book is intended for use during the STA 444/445 courses at Northern Arizona University. The book is broken into two sections based on the related course material. The STA 444 section covers basic introductory content for getting started with statistical programming in R. This course is intended for students of all backgrounds and pairs importantly with courses such as STA 570 (Statistical Methods I) and STA 471 (Regression Analysis). The first section covers details to allow students to work on basic statistical programming while … Read more →
Data Science 2
by Mark Trede
Mark Trede In den Wirtschaftswissenschaften spielen Zufall und Unsicherheit eine wichtige Rolle. Zum einen, weil die ökonomische Theorie deskriptive und normative Aussagen darüber macht, wie ökonomische Akteure sich unter Unsicherheit verhalten und wie sie sich rational verhalten sollten. Zum anderen, weil ökonomische Modelle mit Hilfe von statistischen Verfahren an die Realität angepasst werden sollen oder ökonomische Theorien anhand von emprischen Beobachtungen überprüft werden sollen. In dem Modul Data Science 2 lernen Sie, wie man mit Zufall und Unsicherheit umgeht. Das Modul lässt sich … Read more →
r4ds-ggplot2
by ggiaever
The published version for this module can be found on my bookdown site 2024_r4ds-ggplot2. This module is based on the Data visualization chapter in Hadley Wickham’s book 2nd edition of “R for Data Science”(see … Read more →
Statistical Inference via Data Science
by Chester Ismay and Albert Y. Kim
An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools. […] This is the website for Statistical Inference via Data Science: A ModernDive into R and the Tidyverse! Visit the GitHub repository for this site and find the book on Amazon. You can also purchase it at CRC Press using discount code ADC24. This work by Chester Ismay and Albert Y. Kim is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International … Read more →
CS5702 Modern Data Book
by Martin Shepperd
This is a draft course-book for the MSc Data Science Analytics module CS5702 Modern data […] This book cover image was generated using R and the famous cars dataset1. Courtesy of Giora Simchoni’s fun R package {kandinsky} which generates random images from datasets in the style of the painter … Read more →
Introduction to R
by Jena University Hospital, Institute of Medical Statistics, Computer and Data Sciences, Julia Palm (julia.palm@med.uni-jena.de)
Accompanying IMSID course […] This instruction manual belongs to the course Introduction to R which is taught at the Institute of Medical Statistics, Computer and Data Sciences at Jena University Hospital. Each chapter belongs to one of the five course dates. It is written in a way that should allow you to reproduce the entire course by yourself on your personal computer. There are a lot of code examples in this instruction manual. You can generally recognize a piece of R code in this document by the grey highlighting. If the code returns a result, the result is displayed directly below the … Read more →
r4ds-ggplot2
by ggiaever
r4ds-ggplot2 […] This website us the ggplot visulization sections of Hadley Wickham’s book 2nd edition of “R for Data Science”. … Read more →
Everyday-R: Practical R for Data Science
by by Brian Jungmin Park
This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] Note: this book is a work in progress. All source code for this project are available on my GitHub, which is linked in 1.4. This book serves as a collection of R Markdown files helps users in learning the practical syntax and usage of R for data science. Mainly, code snippets and workflow aimed at tackling everyday tasks in data science will be covered, which includes data cleaning, data wrangling, iterations, machine ... Read more →
Data Science
by Tiffany Timbers, Trevor Campbell, and Melissa Lee
This is a textbook for teaching a first introduction to data science. […] Data Science This is the website for Data Science: A First Introduction. You can read the web version of the book on this site. Click a section in the table of contents on the left side of the page to navigate to it. If you are on a mobile device, you may need to open the table of contents first by clicking the menu button on the top left of the page. You can purchase a PDF or print copy of the book on the CRC Press website or on Amazon. For the python version of the textbook, visit https://python.datasciencebook.ca. This … Read more →
Product Data Science Interview: The Bar to Get Hired
by Free sample of 34 curated questions and answers that will help you succeed in FAANG-style interviews for product data science
Product Data Science Interview: The Bar to Get Hired […] Product data scientists (PDS) primarily work on metrics, experiments, and strategy. Their impact is measured by how effectively they can influence company direction through data. Successful product data scientists need a solid foundation in statistics and A/B testing, strong intuition and product sense, and excellent communication skills. A typical interview loop (onsite) for a product data science role will cover the following topics: This free sample of the complete book provides an introduction to the types of questions you might … Read more →
Data Science in Action
by Kristopher Pruitt
Data Science in Action […] This textbook is currently in DRAFT form and will be updated frequently. The objective of this textbook is to provide an approachable introduction to the knowledge, skills, and abilities of modern data scientists. Data-driven problem solving need not be restricted to the realm of advanced mathematics or expert computer programming. Data science can and should be practiced by all. In this text, we unveil the methods and tools applied by data scientists to solve real-world problems in a variety of domains. However, the content is accessible for anyone with the … Read more →
Data Science 1
by Mark Trede
Mark Trede Wirtschaftswissenschaften sind empirische Wissenschaften. Es geht immer um die reale Welt. Damit wir etwas über die reale Welt lernen können, brauchen wir Daten. Im Modul Data Science 1 lernen Sie, wie man mit Daten arbeitet. Die Kenntnisse aus diesem Modul werden für Sie sowohl im Laufe Ihres Studiums als auch darüber hinaus wertvoll sein. Das gilt nicht nur für Tätigkeiten in der Wissenschaft, sondern auch in der Praxis. Natürlich ist Data Science ohne den Einsatz von Computern nicht möglich. Sie werden in diesem Modul lernen, wie man mit Hilfe von Computersoftware die Daten … Read more →
Modelling Space and Time with GAMS: spatially and temporally varying coefficient models
by Lex Comber
This is a workshop introducing GGP-GAMs as a method for undertaking spatially and temporally varying coefficient models […] GAM (General Additive Models) are emerging as the goto approach for all kinds of data science activities. GAMS perform as well or better than most machine learning models and they are relatively fast. They are powerful and quick but critically they offer a middle ground between overly simple but interpretable standard statistical approaches, and efficient but opaque machine leaning algorithms, where it is difficult to understand how one variable relates to an outcome. … Read more →
Machine Learning
by Dr. S. Jackson
These are the course notes for the Machine Learning module of Durham University’s Masters of Data Science course. […] Welcome to the material for the first half of the Machine Learning module MATH42815 of the Masters of Data Science course at Durham University. These pages will update as the course progresses, and consist of relevant lecture notes, practical demonstrations (in R) and practical workshop sessions. I would recommend that you use the html version of these notes (they have been designed for use in this way), however, there is also a pdf version of these notes. If you would like … Read more →
Exploratory Data Analysis and Visualization
by Luis Alvarez
This book studies exploratory data analysis and data visualization in the context of a university degree in Data Sciences. […] This document is an English translation of the book Análisis Exploratorio de Datos y Visualización, it covers the contents of an introductory course on exploratory data analysis and visualization in a university degree in Data Sciences. Exploratory data analysis is a very broad field, and it is not possible to teach all its aspects in depth in a single course. This course, of an introductory nature, aims to provide a solid foundation in the most important tools in … Read more →
Financial Data Science
by Prof. Dr. Ryan Riordan & Teaching Assistants
This bookdown contains the teaching materials for the projectcourse Financial Data Science at the LMU Munich. […] Here you will find the course pages for the projectcourse Financial Data Science. The projectcourse is offered regularly in the winter and summer term and aims at providing in-depth knowledge about the programming language Python and its most important libraries for data analysis. Each summerterm, the course is taught in cooperation with the Institute for Finance & Banking and consists of two parts. Each winterterm, the course extends the introduction of programming language … Read more →
Guide on Academic Writing
by Prof. Dr. Ryan Riordan & Teaching Assistants
This bookdown contains the teaching materials for the projectcourse Financial Data Science at the LMU Munich. The files have been set up by Lisa Kaminski. [...] Here you will find supporting material on how to write academically. This guide is a generalized framework for seminar reports, bachelor and master theses. Disclaimer: The following guidelines should not be seen as static set of immutable rules, but rather as a profound and generic guideline. This guide is subject to constant revision and extension! If you come across missing subjects, redundancies or inconsistencies, we are ... Read more →
Notes for Nonparametric Statistics
by Eduardo García-Portugués
Notes for Nonparametric Statistics. MSc in Statistics for Data Science. Carlos III University of Madrid. [...] Welcome to the notes for Nonparametric Statistics. The course is part of the MSc in Statistics for Data Science from Carlos III University of Madrid. The course is designed to have, roughly, one session per each main topic in the syllabus. The schedule is tight due to time constraints, which will inevitably make the treatment of certain methods somehow superficial. Nevertheless, the course will hopefully give you a respectable panoramic view of different available topics on ... Read more →
A First Course on Statistical Inference
by Isabel Molina Peralta and Eduardo García-Portugués
Notes for Statistical Inference. MSc in Statistics for Data Science. Carlos III University of Madrid. [...] Welcome to the notes for Statistical Inference. The course is part of the MSc in Statistics for Data Science from Carlos III University of Madrid. The course is designed to have, roughly, one session per each main topic in the syllabus. The schedule is tight due to time constraints, which will inevitably make the exposition of certain methods somehow superficial. Nevertheless, the course and exercises will hopefully give you a respectable panoramic view of the fundamentals of ... Read more →
The Data Preparation Journey
by Martin Monkman
Before you can analyze your data, you need to ensure that it is clean and tidy. […] Welcome to The Data Preparation Journey: Finding Your Way With R, a book published with CRC Press as part of The Data Science Series. This is a work-in-progress; the most recent update is 2024-02-25. It is routinely noted that the Pareto principle applies to data science—80% of one’s time is spent on data collection and preparation, and the remaining 20% on the “fun stuff” like modelling, data visualization, and communication. There is no shortage of material—textbooks, journal articles, blog posts, online … Read more →
GEOG5917 Big Data & Consumer Analytics - RStudio Practicals
by Lex Comber
This contains materials to support the University of Leeds GEOG5917 module, delivered by Lex Comber […] This is an on-line book written to support the practicals for the GEOG5917 Big Data and Consumer Analytics module, delivered by Lex Comber of the School of Geography, from the University of Leeds. A real book was written based on the materials developed for this module: Geographical Data Science and Spatial Data Analysis: An Introduction in R (Comber and Brunsdon 2021 - link here) and the module also draws from An Introduction to Spatial Analysis and Mapping in R (Brunsdon and Comber 2018 … Read more →
Statistics 240 Course Notes
by Bret Larget
This book contains case studies and course notes for STAT 240, Introduction to Data Modeling, at the University of Wisconsin, including instruction for many tidyverse packages […] Statistics 240 is a first course in data science and statistical modeling at the University of Wisconsin - Madison. The course aims to enable you, the student in the course, to gain insight into real-world problems from messy data using methods of data science. These notes chart an initial path for you to gain the knowledge and skills needed to become a data scientist. The structure of the course is to present a series … Read more →
Introduction to Inferential Statistics
by Dr. Marc Trussler
Class notes for PSCI-1801 […] Current as of 2024-01-05 Lecture: MW 12-1:30pm (MCNB 309) Dr. Marc Trussler trussler@sas.upenn.edu Fox-Fels Hall 32 (3814 Walnut Street) Office Hours: M 9-11am TA: Dylan Radley dradley@sas.upenn.edu Fox-Fels Hall 35 (3814 Walnut Street) Office Hours: Tuesday 11-12 Tuesday 3-4 Thursday 12-1 The first step of many data science sequences is to learn a great deal about how to work with individual data sets: cleaning, tidying, merging, describing and visualizing data. These are crucial skills in data analytics, but describing a data set is not our ultimate goal. The … Read more →
Data Science with R: A Resource Compendium
by Martin Monkman
A modest and very incomplete listing of resources for tackling data science problems in R. […] Draft This book grew out of my evergrowing collection of reference materials that was saved as an expanding array of markdown files in a github repo. By assembling it as a book, I hope that it will be more accessible and useful to other R users. The author would like to acknowledge everyone who has contributed to the books, articles, blog posts, and R packages cited within. License This work by Martin Monkman is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada … Read more →
STA 265 Notes (Methods of Statistics and Data Science)
by Christopher Mecklin
This are notes for STA 265 at Murray State University for students in Dr. Christopher Mecklin’s class. […] This chapter includes both material from the textbook and material that I have added to aid you in using R and R Studio for statistical and data science tasks. To both illustrate the 4-Step Process of statistical modeling and to review the two-sample (t)-test, I will illustrate testing to see if there was a statistically significant differnce between two different sections of a class on their final exam, where one section took a final on Monday morning and the other one on Friday … Read more →
381M Course Bookdown
by Josephine Lukito
This is a textbook for the course J381M at UT-Austin. […] Welcome to the J381M Textbook! In this course, we will learn how to use R for Computational Communication Research and Data Science, focusing on skills such as data wrangling, basic statistics, data visualization, data collection, NLP, network analysis, and machine learning. This is a survey course that is meant to give you a taste of data science. In truth, many of these topics are rich enough to warrant full courses. This textbook is best paired with the J381M course materials, including lectures, readings, and course assignments. … Read more →
Advanced R Course
by Florian Privé
This contains materials for the Advanced R course of the doctoral school of Grenoble, France. […] This material is licensed under the Creative Commons Attribution-ShareAlike 3.0 License. Florian Privé is a researcher in predictive human genetics, fond of Data Science and an R(cpp) enthusiast. He is also the founder and former organizer of the Grenoble R user group. You can find him on Twitter and GitHub as @privefl and on Stack Overflow as F. Privé. … Read more →
Analytics for a Changing Climate: Introduction to Social Data Science
by Stanford Summer Course 2023 | Instructor: Tyer McDaniel, Sociology, tylermc@stanford.edu
This will serve as a course reader for SOC 128D, Summer 2023. […] Office Hours: Fridays and Mondays, 11:00am-12:30pm https://calendly.com/tylermcdaniel/tyler-s-office-hours Course Description: Data science has rapidly gained recognition within the social sciences because it offers powerful new ways to ask questions about social systems and problems. This course will examine how tools from data science can be used to analyze pressing issues relating to disaster, inequality, and scarcity in the Anthropocene (the current period in which humans are the primary driver of planetary changes). We … Read more →
Introduction to R for Health Data Science
by Statistics Team
Introduction to R course, as used on MSc Health Data Science […] As a Health Data Scientist, it is vitally important that you have a firm understanding of a statistical programming language, and that you can work in a clear, reproducible fashion. This course will provide you with the baseline skills to use R for health data science. Get you ‘up and running’ using R and RStudio on your machine. Introduce the basics of programming in R (a key skill for a health data scientist). Introduce good practices of workflows and reproducibility in data science. Enable you to develop your skills … Read more →
R/RStudio Companion
by Statistics/Data Science
Companion document to Introduction to Statistical Investigations using R/RStudio. […] This companion was designed for use in STAT160 (Introduction to Data Science), however could be used for any intro-level data science course. The textbook for the course is Introduction to Statistical Investigations (Tintle et. al). Through in-class and home work assignments, students will learn to use R and RStudio. In this companion, we will review the commands and functions students will need to perform statistical analysis and generate statistical … Read more →
Targeted Learning in R
by Mark van der Laan, Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, Alan Hubbard
An open source handbook for causal machine learning and data science with the Targeted Learning framework using the tlverse software ecosystem. […] Targeted Learning in R: Causal Data Science with the tlverse Software Ecosystem is an fully reproducible, open source, electronic handbook for applying Targeted Learning methodology in practice using the software stack provided by the tlverse ecosystem. This work is a draft phase and is publicly available to solicit input from the community. To view or contribute, visit the GitHub repository. The contents of this handbook are meant to serve as a … Read more →
MLFE R labs (2023 ed.)
by Prof. Michela Cameletti & Tutor Rasoul Samei
Notes for the R labs of the MLFE course @ Unibg […] You are reading the lecture notes of the R labs for the Machine learning for Economics (MLFE) course at University of Bergamo (academic year 2022/23). The MLFE course is the second module of the Coding for Data Science course. The MLFE R labs are designed for students who already have some experience with R programming thanks to the first module of the Coding and Machine Learning course. Click here and here to access the R lab notes of the first module regarding introduction to R language and the tidyverse package. Enjoy the journey! … Read more →
AI and Machine Learning For Finance 2022/23
by Michela Cameletti
Notes for the R labs of the AIMLFF course @Unibg […] You are reading the lecture notes of the R lectures for the AI and Machine Learning for Finance (AIMLFF) course at University of Bergamo (academic year 2022/23). See here for more details. In this notes R programming language for data science will be introduced (with respect to data manipulation, data visualization and communication and implementation of machine learning methods). For this part I suggest the following on-line book: Enjoy the journey! In the following lecture notes, this font (with grey background) represents R code. The … Read more →
Curso: R para análisis de datos
by Diana García Cortés
Contenido del curso: R para análisis de datos […] Este libro contiene las notas desarrolladas para acompañar el curso de “R para análisis de datos”. Pretende ser un compendio de anotaciones de los temas tratados, con la finalidad de que las participantes cuenten con material de apoyo y consulta. No pretende ser una fuente exhaustiva sobre el uso de los paquetes de tidyverse ya que para eso existen excelentes recursos como: R for Data Science [1] y su versión en español: R para ciencia de datos [2]. Este contenido se encuentra en … Read more →
Financial Data Science
by Prof. Dr. Ryan Riordan & Teaching Assistants
This bookdown contains the teaching materials for the projectcourse Financial Data Science at the LMU Munich. The files have been set up by Lisa Kaminski. [...] Here you will find the course pages for the projectcourse Financial Data Science. The course is offered regularly in the summer term and aims at providing in-depth knowledge about the programming language Python and its most important libraries for data analysis. Furthermore, the course introduces the topic of database management and the process of retrieving, aggregating and manipulating data using SQL. Students will learn to ... Read more →
STM1001: Introduction to Machine Learning in R
STM1001 Machine Learning (Data Science Stream) […] Welcome to the final content supplement for the Data Science stream of STM1001. Throughout the semester, as we cover different aspects of statistics and data science, supplementary documents such as this one will be used to enhance your learning experience. This document contains material to support your learning as you complete Computer Labs 9B, 10B and 11B of the Data Science stream. We recommend that you take a few minutes to browse the different sections in this document before Computer Lab 9B. We suggest that you aim to read through … Read more →
Geographic Data Science with R: Visualizing and Analyzing Environmental Change
by Michael C. Wimberly
A book example for a Chapman & Hall book. […] We live in a time of unprecedented environmental change, driven by the effects of fossil fuels on the Earth’s climate and the expanding footprint of human land use. To mitigate and adapt to these changes, there is a need to understand their myriad impacts on human and natural systems. Achieving this goal requires geospatial data on a variety of environmental factors, including climate, vegetation, biodiversity, soils, terrain, water, and human populations. Consistent monitoring is also necessary to identify where changes are occuring and … Read more →
Designing and Building Data Science Solutions
by Jonathan Leslie, Neri Van Otten
Data science, machine learning and artificial intelligence (AI) can have game-changing impacts for businesses, empowering them to increase operational efficiency, improve the quality of their services and understand their customers better. Yet for these benefits to be realised, data science initiatives must be designed and executed in a sensible way. Often these projects, while successful from a scientific standpoint, miss the mark in terms of business impact. Many business leaders are left feeling unsettled, balancing the need for innovation and the adoption of revolutionary technologies with an uncomfortable degree of uncertainty and risk of failure. For the data scientist the situation can be equally unnerving, with uncertainties about how to deliver a successful project when the path is not clear. Yet, these uncertainties and risks – for the business leader and the data scientist alike – can be controlled and managed if approached in a sensible manner. Your authors have designed and delivered hundreds of projects across a wide range of industries. We have made many mistakes, and in the process we have learned what works well and where the common pitfalls lie. We wrote this book to share our experiences in hopes that it will help the reader – whether a data science practitioner or a business leader – reduce these risks and design projects that have the greatest chance of success. Much of the content in this guide is derived from lessons we have given to our students. Here we have gathered, organised and expanded on those bits of advice to serve as a resource for anyone considering embarking on a data science journey. We share our approach to data science projects, addressing topics such as alignment to business imperatives, project design, project delivery and evaluation of success. Data science can be an exciting, invigorating field, and for the business leader, it can bring about revolutionary changes to an organisation that can come with huge returns on investment and value added. For the data scientist, designing and delivering successful projects is rewarding, stimulating and tremendously gratifying. We hope this guide gives you the confidence to understand the risks and approach your project in a sensible way. Read more →
STM1001: Foundational Biology for Analyses of Biological Data
STM1001 Biology (Science/Health/Data Science Streams) […] Welcome to another content supplement for the Science, Health and Data Science streams of STM1001. Throughout the semester, as we cover different aspects of statistics and data science, supplementary documents such as this one will be used to enhance your learning experience. This document contains material to support your learning as you complete Computer Lab 8B of the Science, Health or Data Science streams. We recommend that you take a few minutes to browse the different sections in this document (just click on the sections in the … Read more →
STM1001: Data Visualisation in R
STM1001 Data Visualisation (Data Science Stream) […] Welcome to the first content supplement for the Data Science stream of STM1001. Throughout the semester, as we cover different aspects of data science, supplementary documents such as this one will be used to enhance your learning experience. This document contains support material to support your learning as you complete Computer Labs 2B, 3B and 4B of the Data Science stream. We recommend that you take a few minutes to browse the different sections in this document (just click on the sections in the menu bar to your left). You don’t need … Read more →
Modern R with the tidyverse
by Bruno Rodrigues
This book will teach you how to use R to solve your statistical, data science and machine learning problems. Importing data, computing descriptive statistics, running regressions (or more complex machine learning models) and generating reports are some of the topics covered. No previous experience with R is needed. […] I have been working on this on and off for the past 4 years or so. In 2022, I have updated the contents of the book to reflect updates introduced with R 4.1 and in several packages (especially those from the {tidyverse}). I have also cut some content that I think is not that … Read more →
Biometry
by Pleuni Pennings and Kevin Magnaye
Course notes for Biometry. […] You belong in this course and in the field of data science! We are excited to learn with each and every one of you. We are here to support your success. We have no doubt that you will do great things with the data science skills you learn in this course because of who you are as a person and the values you bring with you from your culture, family, and life experiences. We want to invite you to bring your whole self into our data science learning community. Each of you brings cultural assets and personal perspectives that will allow you to make unique … Read more →
Introduction to R for Data Science: A LISA 2020 Guidebook
by Jacob D. Holster
Introduction to R for Data Science: A LISA 2020 Guidebook […] Data science is emerging as a vital skill for researchers, analysts, librarians, and others who deal with data in their personal and professional work. In essence, data science is the application of the scientific method to data for the purpose of understanding the world we live in. More specifically, data science tasks emerge from an interdisciplinary amalgam of statistical analysis, computer science, and social science research conventions. Although other programming languages such as python exceed R in general popularity, R … Read more →
R Programming for Data Science
by Roger D. Peng
The R programming language has become the de facto programming language for data science. Its flexibility, power, sophistication, and expressiveness have made it an invaluable tool for data scientists around the world. This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code. With the fundamentals provided in this book, you will have a solid foundation on which to build your data science toolbox. Read more →
MLFE R labs (2022 ed.)
by Prof. Michela Cameletti & Tutor Marco Villa
Notes for the R labs of the MLFE course @ Unibg […] You are reading the lecture notes of the R labs for the Machine learning for Economics (MLFE) course at University of Bergamo (academic year 2021/22). The MLFE course is the second module of the Coding for Data Science course. The MLFE R labs are designed for students who already have some experience with R programming thanks to the first module of the Coding and Machine Learning course. Click here and here to access the R lab notes of the first module regarding introduction to R language and the tidyverse package. Enjoy the journey! … Read more →
AI and Machine Learning For Finance 2021/22
by Michela Cameletti
Notes for the R labs of the AIMLFF course @ Unibg […] You are reading the lecture notes of the R lectures for the AI and Machine Learning for Finance (AIMLFF) course at University of Bergamo (academic year 2021/22). See here for more details. In this notes R programming language for data science will be introduced (with respect to data manipulation, data visualization and communication and implementation of machine learning methods). For this part I suggest the following on-line book: Enjoy the journey! In the following lecture notes, this font (with grey background) represents R code. The … Read more →
Úvod do analýzy údajov pomocou R
by Tomáš Bacigál
Základy jazyka R a úvod do Data Science: prieskumná analýza, transformácia údajov (dplyr), vizualizácia (ggplot2), čistenie údajov (tidyr), interaktívne grafy (htmlwidgets, shiny, …), komunikácia (RMarkdown), efektívne programovanie (parallel, RCpp, RSQLite). […] I don’t think anyone actually believes that R is designed to make everyone happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. (Roger D. Peng, r. 2004) Tento citát vystihuje univerzálnosť softvérového nástroja R1. Tak ako u viacerých „open … Read more →
DATA 3320 Data Science Methodology and Applications
by Brian Fischer
Hello. This is your course reader for DATA 3320 Data Science Methodology and Applications. Use the chapters of the reader to guide your work through each project. You will find the associated R Markdown file for each project on … Read more →
R for Data Science - 한국어
by 해들리 위컴, 개럿 그롤문드, 번역-김설기, 최혜민
A book created with bookdown. […] 이 곳은 해들리 위컴과 개럿 그롤문드의 책 “R for Data Science” 의 한국어 번역 (번역: 김설기, 최혜민)입니다. 원어 웹북이 현재 2판 업데이트되고 있으며, 이에 따라 본 한국어웹북도 업데이트 중입니다. 본 웹북은 RMarkdown 과 bookdown 으로 작성되었으며, 소스코드는 https://github.com/sulgik/r4ds 입니다. 영문의 연습문제해답도 볼 수 있습니다. 종이책(1판)은 알라딘 등에서 구입할 수 … Read more →
An(other) introduction to R
by Felix Lennert
This is a gentle introduction to R and the basic usage of some tidyverse packages (dplyr, tidyr, ggplot2, forcats, stringr) for data manipulation and visualization. […] Dear student, in the following, you will receive a gentle introduction to R and how you can use it to work with data. This tutorial was heavily inspired by Richard Cotton’s “Learning R” (Cotton 2013) and Hadley Wickham’s and Garrett Grolemund’s “R for Data Science” (abbreviated with R4DS). The latter can be found online (Wickham and Grolemund 2016). We will not immediately start out with the packages from the tidyverse … Read more →
STM1001: Introduction to Bioinformatics in R
STM1001 Bioinformatics (Science/Health Science/Data Science Modules) […] Welcome to another content supplement for the STM1001 Science, Health Science and Data Science modules. Throughout the semester, as we cover different aspects of statistics and data science, supplementary documents such as this one will be used to enhance your learning experience. This document contains material to support your learning as you complete Computer Labs 7B and 8B of the Science, Health Science or Data Science modules. We recommend that you take a few minutes to browse the different sections in this … Read more →
Practical Data Skills
by Introduction To Data Science
Practical Data Skills […] The purpose of this book is to provide practical data science skills to managers and business analysts. The focus is helping the reader develop pragmatic skills they can apply within their organizations to extract value from data. This book will not provide a complete and rigorous overview of data science, statistics, or computer programming, but it will help the reader quickly learn how to process and analyze data in the R programming language. The book assumes nothing more than a high school level background in mathematics - it requires no prior knowledge of … Read more →
Painting the Malaysian Covid Public Data
by Azman Hussin and Wan M Hasni
The book is designed primarily for data science and R beginners who want to learn exploratory data analysis (EDA) through visualization in a practical way by working on actual data related to a real problem. We continue to stress these themes in the book; EDA, visualization, actual data, and learning by solving problems (#learnbydoing). We envisage that the book will only have an online version because of the dynamic nature of the problems related to Covid and the increasing data. The Covid pandemic should be of concern to all. Everyone is affected through being infected, constrained by … Read more →
Data Analytics: A Small Data Approach
by Shuai Huang & Houtao Deng
This book is suitable for an introductory course of data analytics to help students understand some main statistical learning models, such as linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, etc. Data science practice is a process that should be told as a story, rather than a one-time implementation of one single model. This process is a main focus of this book, with many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines. Read more →
Coding for Data Science 2021/22 - R part
by Michela Cameletti
Notes for the R labs of the C4DS course @ Unibg […] You are reading the lecture notes of the R lectures for the Coding for Data Science (C4DS) course at University of Bergamo (academic year 2021/22). C4DS is the first module of the course named Coding and Machine Learning (see here for more details). The C4DS R lectures are designed for students who already have a programming background thanks to the first part of the C4DS course dedicated to Python. In this part of the module we will introduce R programming language for data science (including data manipulation, data visualization and … Read more →
Data Science in R: A Gentle Introduction
by James Scott
A gentle introduction to data science in R. […] Hello and welcome! This online book is structured as a series of walk-through lessons in R that will have you doing real data science in no time. It covers both the core ideas of data science as well as the concrete software skills that will help you translate those ideas into practice. Many of these lessons operate on the premise of “mimic first, understand later.” That is, I’ll introduce bits of R code that do something interesting and ask you to mimic them word for word to see what they do, without necessarily understanding the details at … Read more →
Quantitative Analysis with R
by Brian Wood
A book created with bookdown. […] This is a book about quantitative analysis using R. The target audience are students in the biological or social sciences learning R and seeking to build professional data science skills, including computer science fundamentals and … Read more →
Statistics for Data Science Notes
by Andrew Sage - Stat 255: Lawrence University
This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] We consider a dataset with prices (in $ US) and other information on 53,940 round cut diamonds. The first 6 rows are shown below. The dataset incudes both: What do we notice about the relationship between price and cut? Is this surprising? Next, we examine a histogram, displaying price, cut, and carat size. How does the information in this plot help explain the surprising result we saw in the boxplot? Next, we use a ... Read more →
R @ Ewha (Sunbok Lee)
by Sunbok Lee
R @ Ewha (Sunbok Lee) […] Hi everyone, welcome to the course. This is the introduction to R course at Ewha Womans University. R is a great programming language for statistical analysis and data science. I hope you enjoy R in this course and find many useful applications for your own field. This course is designed for students who don’t have any programming background in social science. In this lecture note, this font represents R commands, variable names, and package names. In order to maximize your learning in this semester, you should read the weekly reading assignment in our … Read more →
Statistics for Data Science R Code Guide
by Andrew Sage - Stat 255, Lawrence University
This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. […] This guide provides details and examples on using R to perform the kinds of statistical analyses that we’ll use in STAT 255. You may use it as a template, as you write code for your assignments. If you want to work with R from your own computer, you can install it for free using the directions below. This will allow you to work on your assignments whenever and wherever you would like. Mac: Windows: The following chapters walk … Read more →
R Training for SSDS
by Amy L Johnson
R Training for SSDS […] This document provides an introduction to R. It was created for use by Stanford Library’s Software Services and Data Science (SSDS) group. Most of this introduction was pulled from the book R for Data Science: https://r4ds.had.co.nz/. Some material was also adapted from the Wellesley College Quantitative Analysis Institute edX R-training, available at https://www.wellesley.edu/qai/onlineresources#Online%20Modules. … Read more →
Statistics for Data Science R Code Guide
by Andrew Sage - Stat 255, Lawrence University
This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] This guide provides details and examples on using R to perform the kinds of statistical analyses that we’ll use in STAT 255. You may use it as a template, as you write code for your assignments. If you want to work with R from your own computer, you can install it for free using the directions below. This will allow you to work on your assignments whenever and wherever you would like. Mac: Windows: The following ... Read more →
KIM EUN SEO Quiz3
by kimeunseo
This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] Hi everyone, welcome to the course.This is the introduction to R course at Ewha Womans University. R is a great programming language for statistical analysis and data science. I hope you enjoy R in this course and find many useful applications for your own field. This course is designed for students who don’t have any programming background in social science. In this lecture note, this font represents R commands, ... Read more →
2021 REU Data Science Training
by Haoqi Wang
2021 REU Data Science Training […] Knowledge Gained: R, data wrangling, data visualization Main materials: Other resources Software: R Weekly Time-Commitment: 3-6 hrs of independent asynchronous work by students supplemented with ~1-3 hours of grad student led synchronous support. Asynchronous work: working through video tutorials Synchronous … Read more →
Data Science Boot Camp
by Arthur Small, Principal Scientist
Course materials for the Data Science Boot Camp, Weldon Cooper Center for Public Service, University of Virginia June 8-10, 2021 […] This boot camp is designed to help research assistants rapidly to become productive doing data science as a member of the Cooper Center team. The mini-course offers introductory training in how to do data science as a member of a team. It also provides an orientation to the projects, resources, and house styles that are specific to the Cooper Center. R4DS: R for Data Science by Hadley Wickham and Garrett Grolemund. An excellent introduction, available for free … Read more →
A Crash Course in Geographic Information Systems (GIS) using R
by Michael Branion-Calles
A Crash Course in Geographic Information Systems (GIS) using R […] There is an assumption of some previous experience in R with this tutorial. If you have not used R before I would start with Chapter 1 of the free, and excellent textbook R for Data Science. The GIS operations in R from the sf package are designed to integrate well with the tidyverse suite of R packages. We will make use of some basic functionality from the dplyr package and will be using pipes (%>%) to sequence multiple operations. If you are unfamiliar with dplyr and pipes I would go through the base vignette before … Read more →
ADVANCED REGRESSION AND PREDICTION: MACHINE LEARNING TOOLS
by Ilán F. Carretero Juchnowicz
This is a bookdown in which the second part of the project of the subject advanced regression and prediction of the Master’s Degree in Statistics for Data Science has been carried out […] Currently Machine Learning (ML) techniques are applied in an infinity of fields to obtain knowledge from data. Among these fields today we can highlight the appearance and effect of the coronavirus disease (COVID-19) in all aspects of society. That is why, by completing this second part of the advanced regression and prediction course, it is intended to use the techniques learned during the practical and … Read more →
R Coding for Data Science - 2020/21
by Michela Cameletti
Notes for the R labs of the R4CDS course @ Unibg […] You are reading the lecture notes of the R lectures for the Coding for Data Science (C4DS) course at University of Bergamo (academic year 2020/21). R is a great programming language especially designed for statistical analysis and data visualisation. The C4DS R lectures are designed for students who already have a programming background thanks to the first part of the C4DS course dedicated to Python. In the 5 lectures dedicated to R I will present you the basics of R for data manipulation, analysis and plotting. Enjoy the journey! In the … Read more →
JavaScript for R
by John Coene
Invite JavaScript into your Data Science workflow. […] This is the online version of JavaScript for R, a book currently under development and intended for release as part of the R series by CRC Press. The R programming language has seen the integration of many languages; C, C++, Python, to name a few, can be seamlessly embedded into R so one can conveniently call code written in other languages from the R console. Little known to many, R works just as well with JavaScript—this book delves into the various ways both languages can work together. The ultimate aim of this work is to demonstrate … Read more →
STA 444/5 - Introductory Data Science using R
by Derek L. Sonderegger
STA 444/5 - Introductory Data Science using R […] This book is intended to provide students with a resource for learning R while using it during an introductory statistics course. The Introduction section covers common issues that students in a typical statistics course will encounter and provides a simple examples and does not attempt to be exhaustive. The Deeper Details section addresses issues that commonly arise in many data wrangling situations and is intended to give students a deep enough understanding of R that they will be able to use it as their primary computing resource … Read more →
Data Science and Econometrics for NBA Analytics
by KP
Data Science and Econometrics for NBA Analytics […] The 1960s was a period of time where oil became the most valuable and productivity augmenting resource for companies to extract, prompting companies to engage in a race to extract as much oil as possible without any regard for the environmental and social consequences. However, recent times has seen data replace oil as the most valuable resource, even for sports organizations. Analytics have a major place in today’s sports world. At some level, every sports organization relies on data and analytics for team development, salary structure, … Read more →
R for Data Science
by Shaoshuang Wen, University of South Carolina,
R for Data Science […] If you see any mistakes or have any suggestions, please do shoot me an email at wenshaoshuang@gmail.com. … Read more →
Do A Data Science Project in 10 Days
by Gangmin Li
This is a data science project practice book. It was initially written for my Big Data course to help students to run a quick data analytical project and to understand 1. the data analytical process, the typical tasks and the methods, techniques and the algorithms need to accomplish these tasks. During convid19, the unicersity has adopted on-line teaching. So the students can not access to the university labs and HPC facilities. Gaining an experience of doing a data science project becomes individual students self-learning in isolation. This book aimed to help them to read through it and follow instructions to complete the sample propject by themslef. However, it is required by many other students who want to know about data analytics, machine learning and particularly practical issues, to gain experience and confidence of doing data analysis. So it is aimed for beginners and have no much knowledge of data Science. the format for this book is bookdown::gitbook. Read more →
Statistical Learning Inmas workshop
by Amelia McNamara
A three-day workshop introducing data science skills, including statistical modeling […] Hone your statistical, data, and computing literacy. Instead of covering all statistical modeling & inference techniques in 2 days (impossible!), focus on a couple of foundational & generalizable tools: linear regression & simple classification. In doing so, we’ll bypass topics in traditional stat intros. Favor applications using real data over theory so that you walk away with a sophisticated set of tools with real applications. Play around with the RStudio software. In doing so, focus on the patterns … Read more →
Applied Genetics, Genomics, and Data Science for the Swine Industry: A Ph.D. Student’s Perspective
by Caleb J. Grohmann
Experiments using advanced analytical methods for description, prediction, and prescription of data and outcomes within the United States commercial swine industry. […] This book is meant to document, in an extremely detailed fashion, every research project, experiment, and analysis conducted during my PhD program. The main goal of this book is to provide a seamless interface in which to combine technical and physical programming methods with the results of each analyses and subsequent discussion of these results. The book will be divided into parts, and each part encompasses all analyses … Read more →
Data Science for Human-Centered Product Design
by Travis Kassab
Data Science for Human-Centered Product Design […] This is a data science tutorial with seven open-source projects that show how statistics and machine learning can be applied to user survey data. The purpose is not to prescribe techniques, but to demonstrate the use of data science in the context of product design. I’ve compiled what I know on the topic, and hope readers adopt some of these techniques and use them in concert with qualitative research and entrepreneurial thinking to build better products. Let’s quickly preview the seven different use-cases. First, we must develop an … Read more →
R for Everyone (Advanced Analytics and Graphics) and LaTeX
by Shaoshuang Wen, University of South Carolina,
R for Everyone (Advanced Analytics and Graphics) and LaTeX […] This is an introduction to R and Latex. In compiling this documents, several sources have been consulted, including Jared P. Lander’s R for Everyone, Hadley Wickham and Garrett Grolemund’s R for Data Science, Dr. Yuleng Zeng’s website, Dr. Timothy M. Peterson’s website, Havard’s Math Prefresher, the course offered by DataCamp. Chapter 3 and Chapter 4 are mostly borrowed from Dr. Yuleng Zeng’s website resources. Install the following applications: Finally, this document is to be used in-class only. As I (will) mention several … Read more →
R for Health Data Science
by Ewen Harrison and Riinu Pius
An introductory book for health data science using R. […] This is the electronic version of the HealthyR book published by Chapman & Hall/CRC. HealthyR resources: healthyr.surgicalinformatics.org Example datasets used in the book can be downloaded here. Version 1.0.1 It is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States … Read more →
Mastering Software Development in R
by Roger D. Peng, Sean Kross, and Brooke Anderson
The book covers R software development for building data science tools. As the field of data science evolves, it has become clear that software development skills are essential for producing useful data science results and products. You will obtain rigorous training in the R language, including the skills for handling complex data, building R packages and developing custom data visualizations. You will learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers. Read more →
Introduction to Data Science
by Ron Sarafian
Class notes for the BGU course - Introduction to Data Science. […] This book accompanies the course I give at Ben-Gurion University, named “Introduction to Data Science”. This is an introductory-level, hands-on focused course, designed for students with basic background in statistics and econometrics, and without programming experience. It introduces students to different tools needed for building a data science pipeline, including data processing, analysis, visualization and modeling. The course is taught in R environment. Many of the contents in this book are taken from BGU’s “R” course, … Read more →
Kursmaterial Introduktion till R - Certifierad Data Scientist
by Filip Wästberg & Torbjörn Sjöberg, Solita
Här finns kursmaterial och övningar för första blocket R-övningar. […] För att ta del av det här materialet behöver du inte några särskilda förkunskaper. Övningarna och upplägget följer boken R for Data Science av Hadley Wickham och Garrett Grolemund som finns gratis. Den boken är ett utmärkt fördjupande komplement till det här … Read more →
Kursmaterial till Certifierad Data Scientist
by Filip Wästberg, Solita
Det här dokumentet innehåller kursmaterial och övningar för det första blockets R-övningar. […] För att ta del av det här materialet behöver du inte några särskilda förkunskaper. Övningarna och upplägget följer boken R for Data Science av Hadley Wickham och Garrett Grolemund som finns gratis. Den boken är ett utmärkt fördjupande komplement till det här … Read more →
Practical Data Science
by Michael Clark
The focus of this document is on data science tools and techniques in R, including basic programming knowledge, visualization practices, modeling, and more, along with exercises to practice further. In addition, the demonstrations of most content in Python is available via Jupyter notebooks. […] Michael Clarkhttps://m-clark.github.io/ … Read more →
My Data Science Notes
by Michael Foley
This is a compendium of notes from classes, tutorials, etc. that I reference from time to time. […] These notes are pulled from various classes, tutorials, books, etc. and are intended for my own consumption. If you are finding this on the internet, I hope it is useful to you, but you should know that I am just a student and there’s a good chance whatever you’re reading here is … Read more →
R for Data Science: Exercise Solutions
by Jeffrey B. Arnold
Solutions to the exercises in “R for Data Science” by Garrett Grolemund and Hadley Wickham. […] If you find any typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations. Opening an issue or submitting a pull request on GitHub Adding an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page. This book contains the exercise solutions for the book R for Data … Read more →
Data Science con R
by Autor: Mg. Daniel Paredes Inilupu
Teoría y práctica para ingresar al mundo del análisis de datos y predicciones con R, por Daniel Paredes […] Este libro ha sido creado a partir de notas que me facilitaron el aprendizaje y van desde lo más básico hasta un nivel intermedio-avanzado. He dedicado más de 700 horas a crear este libro. Puedes reconocer este esfuerzo comprando la versión en pdf de este libro en leanpub. Con ello no solo accederás a todos las actualizaciones futuras, sino también accedes por 3 meses a preguntas directamente con el autor de temas del libro o de cómo aplicarlo al trabajo. La version en web en … Read more →
Ready for R: Notebook Reference
by Aaron Coyner and Ted Laderas
This is a minimal example of using the bookdown package to write a book. The output format for this example is bookdown::gitbook. […] This course introduces you R by working through common tasks in data science: importing, manipulating, and visualizing data. R is a statistical and programming computer language widely used for a variety of applications. Before proceeding with these training materials, please ensure you have an RStudio.cloud account and can see the workspace. This is a searchable website that serves as a reference for the Ready for R course. This gitbook is not meant to be a … Read more →
R for data science: tidyverse and beyond
by Maxine
R for data science: tidyverse and beyond […] 关于 R for Data Science (Wickham and Grolemund 2016) 的个人笔记,随缘更新。任何建议:https://github.com/enixam/rfordatascience/issues 或 565702994@qq.com tidyverse … Read more →
Panduan Menyusun Database Menggunakan Microsoft Access
by Technaut
Buku ini merupakan panduan yang digunakan oleh peserta kelas Data Science untuk Bisnis dalam menyusun database penjualan menggunakan Microsoft Access. […] Buku ini merupakan bacaan tambahan bagi peserta kelas Data Science untuk Bisnis. Pada buku ini diberikan penjelasan singkat terkait cara membangun database menggunakan aplikasi Microsoft Access. Adapun topik yang disajikan dalam buku ini antara … Read more →
¡Manos a la Data!
by BEST: Behavioral Economics & Data Science Team
Open book que recolecta cada data del proyecto del mismo nombre […] Nota: El libro se encuentra en constante desarrollo. Se actualizará cada semana producto de resumir los análisis de las bases de datos semanales del proyecto manos a la data. Este libro ha sido elaborado por BEST. Hace unos años el término Data Science no era tan conocido ni utilizado por la comunidad internacional, y menos aún local (Perú). En realidad, era un término usado rara vez por los estadísticos y algunos miembros de la computación científica. Y es que nuestra sociedad ha evolucionado, y con ellos ciertas … Read more →
經濟資料視覺化處理
by 林茂廷, 國立臺北大學經濟學系
經濟資料視覺化 […] This course is designed to develop the skill of efficient graphic language, where efficiency is defined as the data information delivery that is self-contained, concise, and non-distorting. The programming language is mainly based on R, with a little bit of Javascript toward the end. Though there is no computer programming knowledge required, basic R knowledge will help (the ebook, R for Data Science, would be a good start). By the end of the course, students who learn well should be able to design professional … Read more →
Population Health Data Science with R
by Tomás J. Aragón
Population health data science (PHDS). […] We are writing this book to introduce R—a programming language and environment for statistical computing and graphics—to public health epidemiologists, health care data analysts, data scientists, statisticans, and others conducting population health analyses. Recent graduates come prepared with a solid foundation in epidemiological and statistical concepts and skills. However, what is sometimes lacking is the ability to implement new methods and approaches they did not learn in school. This is more apparent today with the emergence of data science … Read more →
Interactive web-based data visualization with R, plotly, and shiny
by Carson Sievert
A useR guide to creating highly interactive graphics for exploratory and expository visualization. […] This is the website for “Interactive web-based data visualization with R, plotly, and shiny”. In this book, you’ll gain insight and practical skills for creating interactive and dynamic web graphics for data analysis from R. It makes heavy use of plotly for rendering graphics, but you’ll also learn about other R packages that augment a data science workflow, such as the tidyverse and shiny. Along the way, you’ll gain insight into best practices for visualization of high-dimensional data, … Read more →
Readings in applied data science
by Qiushi Yan
Readings in applied data science […] This project is highly motivated and inspired by stats337 at Stanford University offered by Hadley Wickham, and Data Science with R: A Resource Compendium by Martin Monkman. They both provided great reading materials in data analysis with R, or applied data science in general. Here I attempt to finish one or two papers per week, draw a brief summary, and document my personal … Read more →
課程介紹
by tpemartin
經濟資料視覺化 […] This course is designed to develop the skill of efficient graphic language, where efficiency is defined as the data information delivery that is self-contained, concise, and non-distorting. The programming language is mainly based on R, with a little bit of Javascript toward the end. Though there is no computer programming knowledge required, basic R knowledge will help (the ebook, R for Data Science, would be a good start). By the end of the course, students who learn well should be able to design professional … Read more →
Technical Foundations of Informatics
by Michael Freeman and Joel Ross
The course reader for INFO 201: Technical Foundations of Informatics. […] Announcement: Starting in 2019, readings for the INFO 201 course will come from the textbook Programming Skills for Data Science, which is available to UW students for free via SafariBooksOnline or in print. Unless specifically directed to a section of this online text, you should refer to the Programming Skills for Data Science textbook. This book covers the foundation skills necessary to start writing computer programs to work with data using modern and reproducible techniques. It requires no technical background. … Read more →
課程大綱
by tpemartin
經濟資料視覺化處理 […] This course is designed to develop the skill of efficient graphic language, where efficiency is defined as the data information delivery that is self-contained, concise, and non-distorting. The programming language is mainly based on R, with a little bit of Javascript toward the end. Though there is no computer programming knowledge required, basic R knowledge will help (the ebook, R for Data Science, would be a good start). By the end of the course, students who learn well should be able to … Read more →
Preguntas entrevistas Data Science
by Sergio Berdiales
Preguntas entrevistas Data Science […] En estas notas trato de responder a diferentes preguntas que un candidato para una posición de Data Scientist se puede encontrar en una entrevista. Muchas de las preguntas vienen directamente de artículos sobre este tema específico (enlaces en la sección ‘02-Referencias’), otras de mi experiencia personal y otras de aportaciones de otras personas. Aquí enlazo una Google sheet con las preguntas que voy recopilando. Si tienes alguna pregunta interesante y quieres añadirla al listado, adelante. Este es el repositorio en github: https://github.com/sergiober … Read more →
Calidad del aire en Gijón
by Sergio Berdiales
Calidad del aire en Gijón […] Los objetivos principales de este proyecto son realizar análisis y visualizaciones de los datos de la estaciones oficiales de monitorización de la calidad del aire de la ciudad de Gijón. Este proyecto es hermano de este otro https://bookdown.org/sergioberdiales/tfm-kschool_gijon_air_pollution/, que fue mi trabajo final del Máster de Data Science en Kschool (por eso hay algunas partes del código comentadas en inglés). En él, además de tratar los datos y realizar distintos ejercicios de visualización de los mismos (ver visualizaciones en Tableau Public), realicé … Read more →
Data Science avec R
by Fousseynou Bah
Data Science avec R […] En décidant d’écrire un livre sur la data science, j’ai longuement débattu dans ma propre tête, je me suis posé plusieurs questions dont une qui revenait constamment: “a-t-on vraiment besoin d’un autre livre sur la data science?” “N’en-t-on pas assez?” Avec le succès dont jouit la discipline, ce n’est certainement pas les ressources qui manquent, aussi bien en ligne que dans les librairies. Et surtout, je me demandais bien “qu’avais-je à dire qui n’avait pas été dit”? Et pourtant, quelques raisons m’ont poussé à reconsidérer ma position. La première est assez égoïte. … Read more →
Data Science con R: Fundamentos y Aplicaciones
by BEST: Behavioral Economics & Data Science Team
El mejor libro en espanol de ciencia de datos, libre y abierto. […] Nota: El libro se encuentra en etapa de desarrollo. Este libro ha sido elaborado por BEST. Hace unos años el término Data Science no era tan conocido ni utilizado por la comunidad internacional, y menos aún local (Perú). En realidad, era un término usado rara vez por los estadísticos y algunos miembros de la computación científica. Y es que nuestra sociedad ha evolucionado, y con ellos ciertas necesidades. La Ciencia de Datos ha venido para quedarse, y en cualquier profesión (economistas, psicólogos, biólogos, … Read more →
UPR-PRISE Data Science Workshop 01/26/2019
by Felix E. Rivera-Mariani, PhD
This manual is part of data science workshop titled GPS of Data Analytics: Making the Witness (the Data) Confess. The output format for was elaborated with bookdown::gitbook. […] Welcome to the data science workshop titled The GPS of Data Analytics: Making the Witness (the Data) Confess. In this workshop, sponsored by the University of Puerto Rico Ponce Research Initiative for Scientific Enhancement, students will learn and implement different aspects of data science, from establishing a set of tools necessary to carry out data science to deploying statistical models through coding, … Read more →
Gijón Air Pollution - An exercise of visualization and forecasting
by Sergio Berdiales
Gijón Air Pollution - An exercise of visualization and forecasting […] My name is Sergio Berdiales and I am a Data Analyst with more than ten years experience in Customer Experience and Quality areas. If you want to know more about me or contact me you can visit my Linkedin profile or my Twitter account. This is my final project for the Kschool Master on Data Science (8th edition). The main objective of this project is to show I can apply the acquired knowledge during the master’s course in a practical way . The Master on Data Science of Kschool is a 230-hour course which includes Python … Read more →
Notes for ST463/ST683 Linear Models 1
by Katarina Domijan, Catherine Hurley
These are the notes for ST463/ST683 Linear Models 1 course offered by the Mathematics and Statistics Department at Maynooth University. This module is offered at as a part of of MSc in Data Science and Data Analytics. It is an introductory course for students who have basic background in Statistics, Data analysis, R Programming and linear algebra (matrices). […] There are many good resources, e.g. Weisberg (2005), Fox (2005), Fox (2016), Ramsey and Schafer (2002), Draper and Smith (1966). We will use Minitab and R (R Core Team 2017). To create this document, I am using the bookdown package … Read more →
Meu log de leitura de R for Data Science
by Marcos V. C. Vital - LEQ-UFAL
Meu log de leitura de R for Data Science […] Se tem alguma pessoa que pode ser considerada um “pop star” do R, seria o Hadley Wickham: o cara é responsável pelo ggplot2 e pelo dplyr, que são alguns dos pacotes mais populares do R! Mas são justamente pacotes que eu quase não uso… :( Deixe eu explicar melhor. Eu sou usuário do R há muitos anos (fiz as contas de cabeça enquanto eu escrevo, e se não me enganei, agora em 2018 seriam uns 13 ou 14 anos!), então já tem um bocado de tempo que aprendi a como resolver (e ensinar) algumas coisas. Até aí tudo bem. Acontece que o Hadley trouxe uma … Read more →
Selected Solutions to R4DS Exercises
by Chunji Wang
This book provides selected solutions to the exercises in the wonderful book R for Data Science by Wickham Hadley. […] This is the website for “Selected Solutions to R4DS Exercises”. This is a joint advanture between Chunji Wang, Ron, Luna, Zhiyin, Chengcheng…. We started the “R4DS Study Club” on Sep 22nd, 2017; If you want to join us, please contact us! The chapter labels in this book is the same as the original R4DS book; go to the corresponding chapter for solutions. You might need to read the beginning of the chapter to load some packages or create some variables that are … Read more →
Course Notes for IS 6489, Statistics and Predictive Analytics
by Jeff Webb
Course notes for IS 6489. […] These are the course notes for IS 6489, Statistics and Predictive Analytics, offered through the Information Systems (IS) department in the University of Utah’s David Eccles School of Business. This is an exciting time for data analysis! The field has undergone a revolution in the last 15 years with increases in computing power and the availability of “big data” from web-based systems of data collection. “Data science” is the umbrella term that describes the result of this revolution—a new discipline at the intersection of many traditional fields such as … Read more →
ModernDive
by Chester Ismay and Albert Y. Kim STARRING FRANK MCGRADE
An open-source and fully-reproducible electronic textbook bridging the gap between traditional introductory statistics and data science courses. […] Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students. This is version 0.2.0 of ModernDive published on August 02, 2017. For previous versions of ModernDive, see Section 1.4. This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding … Read more →
Data Science in Educational Research
by Joshua M. Rosenberg
This is an introduction and tutorial for data science in educational research. … Read more →
Data Science and Visualizations with R
by Jonathan Wong
Data Science and Visualizations with R […] This is a course on the use of tidyverse packages tidyverse provides a complete suite of modern data-handling tools. It is an essential toolbox for any data scientist using R. The tidyverse package is designed to be easy to install. This course will dive into using tidyverse. It will assume you have already installed r and rstudio and how some familiarity on how to use the rstudio. This book will use the nycflights13 dataset This package contains information about all flights that departed from NYC in 2013: 336,776 flights with 16 variables. To … Read more →
The Art of Data Science
by Roger D. Peng and Elizabeth Matsui
The book covers R software development for building data science tools. As the field of data science evolves, it has become clear that software development skills are essential for producing useful data science results and products. You will obtain rigorous training in the R language, including the skills for handling complex data, building R packages and developing custom data visualizations. You will learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers. Read more →
Agile Data Science with R
by Edwin Thoen
A workflow for doing data science in the R language, using Agile principles. […] When I was starting my career as a data scientist, I did not really have a workflow. Freshly out of statistics grad school I entered the arena of Dutch business, employed by a small consulting firm. Between the company, the potential clients and myself, no one knew what it meant to implement a statistical model or a machine learning method in the “real” world. But everybody was interested in this “Big Data” thing, so we quickly started to do consulting work without a clear idea what I was going to do. When we came … Read more →
Data Science at the Command Line, 2e
by Jeroen Janssens
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux. Read more →
Data Science in Education Using R
by Ryan A. Estrellado, Emily A. Freer, Joshua M. Rosenberg, and Isabella C. Velásquez
Bookdown for ‘Data Science in Education Using R’ by Ryan A. Estrellado, Emily A. Freer, Joshua M. Rosenberg, and Isabella C. Velásquez to be published by Routledge in 2024 […] 📘 Notice! This is the website for the second edition of Data Science in Education Using R. For the first edition, visit datascienceineducation-1ed.netlify.app/ Welcome to Data Science in Education Using R! Inspired by {bookdown}, this book is open source. Its contents are reproducible and publicly accessible to people worldwide. The online version of the book is hosted at datascienceineducation.com. There’s this … Read more →
Data Science Live Book
by Pablo Casas
An intuitive and practical approach to data analysis, data preparation and machine learning, suitable for all ages! […] This book is now available at Amazon. Check it out! 📗 🚀. Link to the black & white version, also available on full-color. It can be shipped to over 100 countries. 🌎 The book will facilitate the understanding of common issues when data analysis and machine learning are done. Building a predictive model is as difficult as one line of R code: That’s it. But, data has its dirtiness in practice. We need to sculp it, just like an artist does, to expose its information in order … Read more →
Data Science Practice
by Perry Stephenson
Course notes for 94692 Data Science Practice at the University of Technology, Sydney. […] This website forms the course notes for 94692 Data Science Practice which is an elective subject developed as part of the Master of Data Science and Innovation program at the University of Technology, Sydney. For more information about this subject see the Subject Information. For more information about the MDSI program see the MDSI Prospectus. Whilst these course materials have been produced specifically for MDSI students, they have been made available under a permissive license for the benefit of the wider … Read more →
Hands-On Programming with R
by Garrett Grolemund
This book will teach you how to program in R, with hands-on examples. I wrote it for non-programmers to provide a friendly introduction to the R language. You’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools. Throughout the book, you’ll use your newfound skills to solve practical data science problems. Read more →
Introduction to Data Science
by Rafael A. Irizarry
This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning and helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with GitHub, and reproducible document preparation with R markdown. Read more →
Mastering Spark with R
by Javier Luraschi, Kevin Kuo, Edgar Ruiz
The Complete Guide to Large-Scale Analysis and Modeling. […] In this book you will learn how to use Apache Spark with R. The book intends to take someone unfamiliar with Spark or R and help you become proficient by teaching you a set of tools, skills and practices applicable to large-scale data science. You can purchase this book from Amazon, O’Reilly Media, your local bookstore, or use it online from this free to use website. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States … Read more →
R for Data Science
by Hadley Wickham and Garrett Grolemund
This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data. Read more →
Spatial Data Science
by Edzer Pebesma, Roger Bivand
Data science is concerned with finding answers to questions on the basis of available data, and communicating that effort. Besides showing the results, this communication involves sharing the data used, but also exposing the path that led to the answers in a comprehensive and reproducible way. It also acknowledges the fact that available data may not be sufficient to answer questions, and that any answers are conditional on the data collection or sampling protocols employed. This book introduces and explains the concepts underlying spatial data: points, lines, polygons, rasters, coverages, … Read more →
Yet another ‘R for Data Science’ study guide
by Bryan Shalloway
Notes and solutions to Garrett Grolemund and Hadley Wickham’s ‘R for Data Science’ […] This book contains my solutions and notes to Garrett Grolemund and Hadley Wickham’s excellent book, R for Data Science (Grolemund and Wickham 2017). R for Data Science (R4DS) is my go-to recommendation for people getting started in R programming, data science, or the “tidyverse”. First and foremost, this book was set-up as a resource and refresher for myself1. If you are looking for a reliable solutions manual to check your answers as you work through R4DS, I would recommend using the solutions created and … Read more →