# Machine Learning

# A Guide on Data Analysis

## by Mike Nguyen

This is a guide on how to conduct data analysis in the field of data science, statistics, or machine learning. […] 1. APA (7th edition): Nguyen, M. (2020). A Guide on Data Analysis. Bookdown. https://bookdown.org/mike/data_analysis/ 2. MLA (8th edition): Nguyen, Mike. A Guide on Data Analysis. Bookdown, 2020. https://bookdown.org/mike/data_analysis/ 3. Chicago (17th edition): Nguyen, Mike. 2020. A Guide on Data Analysis. Bookdown. https://bookdown.org/mike/data_analysis/ 4. Harvard: Nguyen, M. (2020) A Guide on Data Analysis. Bookdown. Available at: https://bookdown.org/mike/data_analysis/ … Read more →

# Advanced Statistical Modelling

## by Dr. S. Jackson

These are the course notes for the Machine Learning module of Durham University’s Masters of Data Science course. […] Welcome to the material for the first term of the module Advanced Statistical Modelling MATH3411 at Durham University. These pages will update as the course progresses, consisting of relevant lecture notes, practical demonstrations (in R), exercise sheets and practical sessions. I would recommend that you use the html version of these notes (they have been designed for use in this way), however, there is also a pdf version of these notes, which will also be updated as the … Read more →

# Data Visualization with R Programming

## by สมศักดิ์ จันทร์เอม

สมศักดิ์ จันทร์เอม ภาพนิทัศน์มีความสำคัญอย่างมากในการทำความเข้าใจข้อมูล และเพื่อประสิทธิภาพในการตัดสินใจ เครื่องมือที่ช่วยในการใช้สร้างภาพนิทัศน์ของข้อมูลในปัจจุบัน มีหลายตัว ในหนังสือเล่มจะใช้ภาษาอาร์ในการเขียนโปรแกรมเพื่อสร้างภาพนิทัศน์ และใช้โปรแกรม RStudio เพื่อช่วยการใช้เขียนโปรแกรมภาษาอาร์ให้มีความสะดวกสบายมากยิ่งขึ้นด้วยเครื่องมือช่วยที่หลากหลาย ในหนังสือเล่มไม่ได้สนใจในประเด็นตัวแบบสถิติ (statistics model) เศรษฐมิติ (econometrics) หรือการเรียนรู้ของเครื่องจักร (machine learning) ด้วยภาษาอาร์ แต่ถ้าผู้อ่านได้ศึกษาและทำความเข้าใจในหนังสือเล่มนี้แล้ว ผู้อ่านจะได้เรียนรู้พื้นฐานการเขียนโปรแกรมภาษาอาร์ที่จำเป็นอย่างมีหลักการ เช่นชนิดของโครงสร้างข้อมูลที่สำคัญคือวัตถุแบบเวคเตอร์ (vector) และกรอบข้อมูล (data frame) … Read more →

# Workshop: Applied Machine Learning - Workshop: Applied Machine Learning (with R)

## by Dr. Paul C. Bauer

Dr. Paul C. Bauer This document serves as slides and script for the workshop Applied Machine Learning taught by Paul C. Bauer (Gesis, Mannheim, Online, 20-23th of February 2024). Original material is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. Where I draw on other authors material other licenses may apply. Please note the information in the syllabus as well as the citations and links in the script. For potential future versions of this material see the github repository. If you have feedback or discover errors/dead links please let us … Read more →

# Intel Powered Foundation Course in Machine Learning

## by intel-unnati

A Course Companion Website. […] Step into a realm of innovation with the Intel Unnati Certificate Programme, an avant-garde initiative meticulously crafted to delve into the intricacies of advanced machine learning. This course is not just an educational endeavor; it’s a gateway to a world where theoretical understanding seamlessly converges with hands-on practical application, providing you with a distinctive edge in an ever-evolving tech landscape. In a rapidly changing technological landscape, the Intel Unnati Certificate Programme is your ticket to mastering the latest advancements in … Read more →

# Surrogates

## by Robert B. Gramacy

Surrogates: a new graduate level textbook on topics lying at the interface between machine learning, spatial statistics, computer simulation, meta-modeling (i.e., emulation), and design of experiments. Gaussian process emphasis facilitates flexible nonparametric and nonlinear modeling, with applications to uncertainty quantification, sensitivity analysis, calibration of computer models to field data, sequential design and (blackbox) optimization under uncertainty. Presentation targets numerically competent scientists in the engineering, physical, and biological sciences. Treatment includes historical perspective and canonical examples, but primarily concentrates on modern statistical methods, computation and implementation in R at modern scale. Rmarkdown facilitates a fully reproducible tour complete with motivation from, application to, and illustration with, compelling real-data examples. Read more →

# Machine Learning and Neural Networks

## by Dr. Hailiang Du

These are the course notes for the Machine Learning and Neural Networks module (MATH3431) at Durham University. […] Welcome to the material for the first half of the Machine Learning and Neural Networks module (MATH3431) at Durham University. These pages consist of relevant lecture notes will be updated as the course progresses. I would recommend that you use the html version of these notes (they have been designed for use in this way), however, there is also a pdf version of these notes. In this first half of the module (Michaelmas Term), we will be focusing on “Machine Learning” rather … Read more →

# About Rstudio-

## by Jane Doe

Jane Doe R es un entorno y lenguaje de programación que se centra en el análisis estadístico. Surgió como una implementación de código abierto del lenguaje S, enriquecido con capacidades de ámbito estático. Se ha convertido en uno de los lenguajes de programación más prominentes en la investigación científica y es especialmente destacado en áreas como aprendizaje automático (machine learning), minería de datos, econometría, investigación biomédica, bioinformática y finanzas. Esto se debe en gran medida a su capacidad para cargar diversas bibliotecas o paquetes que ofrecen funcionalidades de … Read more →

# Sobre el lenguaje R

## by Celeste Huaman

Celeste Huaman R es un entorno y lenguaje de programación con un enfoque al análisis estadístico. R nació como una reimplementación de software libre del lenguaje S, adicionado con soporte para ámbito estático. Se trata de uno de los lenguajes de programación más utilizados en investigación científica, siendo además muy popular en los campos de aprendizaje automático (machine learning), minería de datos, econometría, investigación biomédica, bioinformática y en el campo económico-financiero. A esto contribuye la posibilidad de cargar diferentes bibliotecas o paquetes con funcionalidades de … Read more →

# Behavior Analysis with Machine Learning Using R

## by Enrique Garcia Ceja

Behavior Analysis with Machine Learning Using R teaches you how to train machine learning models in the R programming language to make sense of behavioral data collected with sensors and stored in electronic records. This book introduces machine learning concepts and algorithms applied to a diverse set of behavior analysis problems by focusing on practical aspects. Some of the topics include how to: Build supervised models to predict indoor locations based on Wi-Fi signals, recognize physical activities from smartphone sensors, use unsupervised learning to discover criminal behavioral patterns, build deep learning models to analyze electromyography signals, CNNs to detect smiles in images and much more. Read more →

# 381M Course Bookdown

## by Josephine Lukito

This is a textbook for the course J381M at UT-Austin. […] Welcome to the J381M Textbook! In this course, we will learn how to use R for Computational Communication Research and Data Science, focusing on skills such as data wrangling, basic statistics, data visualization, data collection, NLP, network analysis, and machine learning. This is a survey course that is meant to give you a taste of data science. In truth, many of these topics are rich enough to warrant full courses. This textbook is best paired with the J381M course materials, including lectures, readings, and course assignments. … Read more →

# Interpretable Machine Learning

## by Christoph Molnar

Machine learning algorithms usually operate as black boxes and it is unclear how they derived a certain decision. This book is a guide for practitioners to make machine learning decisions interpretable. […] Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This book is about making machine learning models and their decisions interpretable. After exploring the concepts of interpretability, you will learn about simple, interpretable models such as … Read more →

# Machine Learning for Biostatistics

## by Armando Teixeira-Pinto

Machine Learning for Biostatistics […] This module will cover bootstrap and cross-validation. These are two important techniques that are useful to study sample variability, evaluate model performance and choosing tuning parameters in many of the methods covered in this unit. We will switch the order presented in the book Introduction to Statistical Learning and start with bootstrap and then proceed to cross-validation. By the end of this module you should be able to: The file bmd.csv contains 169 records of bone densitometries (measurement of bone mineral density). The following variables were … Read more →

# R Guide

## by Asser Koskinen

R Guide […] R is a programming language and environment for data analytics, machine learning and computer science. This brief guide will get you … Read more →

# Introductory predictive analytics and machine learning in education and healthcare

## by Anshul Kumar

This textbook accompanies the course HE-930 in the PhD in HPEd program at MGH Institute of Health Professions. This book introduces students to basic predictive analytics and machine learning, with examples and applications related to education and healthcare. […] This textbook accompanies the course HE-930—Statistics/Predictive Analytics for Health Professions Education—in the PhD in HPEd program at MGH Institute of Health Professions. HE-930 is a data analytics course that introduces students to basic predictive analytics (PA) and machine learning (ML), with examples and applications … Read more →

# Regression and Analysis of Variance

## by Trevor Hefley

Course notes for Regression and Analysis of Variance (STAT 705) at Kansas State University for Summer 2023 […] This document contains the course notes for Regression and Analysis of Variance at Kansas State University (STAT 705). During the semester we will cover the basics such as regression and ANOVA modeling, parameter estimation, model checking, inference, and prediction. We may also cover modern topics such as regularization, random effects, generalized linear models, machine learning approaches, and Bayesian regression and … Read more →

# Machine Learning for Biostatistics

## by Armando Teixeira-Pinto & Jaroslaw Harezlak

Machine Learning for Biostatistics […] So far, most of the methods that we have seen (with the exception of KNN) assume an additive effect of the predictors. We will now study non-parametric methods to estimate (f(\mathbf x)). By the end of this module you should be able to: The dataset triceps is available in the MultiKink package. You may install.packages(“MultiKink”), load the library (library(MultiKink)) and then run data(“triceps”). The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa. There are 892 observations on the … Read more →

# Machine Learning for Biostatistics

## by Armando Teixeira-Pinto

Machine Learning for Biostatistics […] In this module we will talk about model selection and regularisation methods (also called penalisation methods), namely, ridge and lasso. We will start with classical algorithms for model selection, such as the best subset selection and stepwise (backward and forward) selection. Then we introduce the idea of bias-variance trade-off and the motivation for ridge regression. Finally, we will talk about Lasso regression and some of its extensions. By the end of this module you should be able to: The dataset fat is available in the library(faraway). You have to … Read more →

# Machine Learning for Biostatistics

## by Jaroslaw Harezlak & Armando Teixeira-Pinto

Machine Learning for Biostatistics […] This module will cover methods to explore non-linear effects of numerical predictors on the outcome. By the end of this module you should be able to: The dataset triceps is available in the MultiKink package. You may install.packages(“MultiKink”), load the library (library(MultiKink)) and then run data(“triceps”). The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa. There are 892 observations on the following 3 variables: The data SA_heart.csv is retrospective sample of males in a … Read more →

# Targeted Learning in R

## by Mark van der Laan, Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, Alan Hubbard

An open source handbook for causal machine learning and data science with the Targeted Learning framework using the tlverse software ecosystem. […] Targeted Learning in R: Causal Data Science with the tlverse Software Ecosystem is an fully reproducible, open source, electronic handbook for applying Targeted Learning methodology in practice using the software stack provided by the tlverse ecosystem. This work is a draft phase and is publicly available to solicit input from the community. To view or contribute, visit the GitHub repository. The contents of this handbook are meant to serve as a … Read more →

# AI and ML for Social Scientists - Artificial intelligence and Machine Learning for Social Scientists

## by paul

Website updated on 05 Juni, 2023. This website serves both as slides and script for the MA seminar AI and Machine learning for Social scientists taught by Paul C. Bauer at the University of Mannheim (Spring 2023). The material was/is being developed by Paul C. Bauer . The material is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. Where I draw on other authors’ materials (with their permission) other licenses may apply. I am grateful for feedback and if you find errors please let me know. This website was generated with … Read more →

# MLFE R labs (2023 ed.)

## by Prof. Michela Cameletti & Tutor Rasoul Samei

Notes for the R labs of the MLFE course @ Unibg […] You are reading the lecture notes of the R labs for the Machine learning for Economics (MLFE) course at University of Bergamo (academic year 2022/23). The MLFE course is the second module of the Coding for Data Science course. The MLFE R labs are designed for students who already have some experience with R programming thanks to the first module of the Coding and Machine Learning course. Click here and here to access the R lab notes of the first module regarding introduction to R language and the tidyverse package. Enjoy the journey! … Read more →

# AI and Machine Learning For Finance 2022/23

## by Michela Cameletti

Notes for the R labs of the AIMLFF course @Unibg […] You are reading the lecture notes of the R lectures for the AI and Machine Learning for Finance (AIMLFF) course at University of Bergamo (academic year 2022/23). See here for more details. In this notes R programming language for data science will be introduced (with respect to data manipulation, data visualization and communication and implementation of machine learning methods). For this part I suggest the following on-line book: Enjoy the journey! In the following lecture notes, this font (with grey background) represents R code. The … Read more →

# Machine Learning-based Causal Inference Tutorial

## by Golub Capital Social Impact Lab

This is a tutorial on machine learning-based causal inference. […] This tutorial will introduce key concepts in machine learning-based causal inference. It’s an ongoing project and new chapters will be uploaded as we finish them. Topics currently covered: Please note that this is currently a living document. If you find any issues, please feel free to contact Undral Byambadalai at undralb@stanford.edu. The “changelog” below will keep track of major updates and additions. We’ll illustrate key concepts using snippets of R code. Each chapter in this tutorial is self-contained. You can download … Read more →

# Machine Learning: Teoría y Práctica

## by victor_morales

Machine Learning: Teoría y Práctica […] Este documento está en construcción. Tiene por objetivo recoger material necesario para impartir un curso aplicado de técnicas de Machine Learning. Esta primera versión realiza una revisión detallada de aprendizaje supervizado, no supervizado e incluye una introducción al procesamiento de lenguaje natural (NLP). En la medida de lo posible el documento trabaja con datos de Ecuador. Además, el software utilizado para su desarrollo es R. Series de Tiempo Econometría Espacial Métodos de evaluación de … Read more →

# Machine Learning

## by Dr. S. Jackson

These are the course notes for the Machine Learning module of Durham University’s Masters of Data Science course. […] Welcome to the material for the first half of the Machine Learning module MATH42815 of the Masters of Data Science course at Durham University. These pages will update as the course progresses, and consist of relevant lecture notes, practical demonstrations (in R) and practical workshop sessions. I would recommend that you use the html version of these notes (they have been designed for use in this way), however, there is also a pdf version of these notes. If you would like … Read more →

# STM1001: Introduction to Machine Learning in R

STM1001 Machine Learning (Data Science Stream) […] Welcome to the final content supplement for the Data Science stream of STM1001. Throughout the semester, as we cover different aspects of statistics and data science, supplementary documents such as this one will be used to enhance your learning experience. This document contains material to support your learning as you complete Computer Labs 9B, 10B and 11B of the Data Science stream. We recommend that you take a few minutes to browse the different sections in this document before Computer Lab 9B. We suggest that you aim to read through … Read more →

# Machine Learning Part II

## by Dr. Hailiang Du

These are the course notes for the Machine Learning module (MATH42815) at Durham University. […] Welcome to the material for the second half of the Machine Learning module (MATH42815) at Durham University. These pages consist of relevant lecture notes that will be updated as the course progresses. I would recommend that you use the HTML version of these notes (they have been designed for use in this way), however, there is also a pdf version of these notes. In this second half of the module, we will first look into the simple yet powerful tree-based models and then dive into the famous yet … Read more →

# Machine Learning: Unsupervised and Supervised Learning

## by Marc Scott

This is a minimal example of using the bookdown package to write a book. set in the _output.yml file. The HTML output format for this example is bookdown::gitbook, [...] Supervised and unsupervised machine learning, also known as classification and clustering, are important statistical techniques commonly applied in many social and behavioral science research problems. Both seek to understand social phenomena through the identification of naturally occurring homogeneous groupings within a population. Supervised learning techniques are used to sort new observations into pre- existing or ... Read more →

# Business Intelligence II, forår 2023: Anvendt Machine Learning

## by Mads Stenbo Nielsen

This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] Første modul omhandler en generel introduktion til softwareprogrammet R (https://www.r-project.org/). Her gengives en række eksempler på R-kode samt det tilhørende R-output for at illustrere forskellige grundlæggende funktionaliteter i programmet. Programmet R tager koder som input, så hver gang man skal have udført noget, skal man skrive de(n) relevante kode(r), som herefter eksekveres af programmet og giver det ... Read more →

# Designing and Building Data Science Solutions

## by Jonathan Leslie, Neri Van Otten

Data science, machine learning and artificial intelligence (AI) can have game-changing impacts for businesses, empowering them to increase operational efficiency, improve the quality of their services and understand their customers better. Yet for these benefits to be realised, data science initiatives must be designed and executed in a sensible way. Often these projects, while successful from a scientific standpoint, miss the mark in terms of business impact. Many business leaders are left feeling unsettled, balancing the need for innovation and the adoption of revolutionary technologies with an uncomfortable degree of uncertainty and risk of failure. For the data scientist the situation can be equally unnerving, with uncertainties about how to deliver a successful project when the path is not clear. Yet, these uncertainties and risks – for the business leader and the data scientist alike – can be controlled and managed if approached in a sensible manner. Your authors have designed and delivered hundreds of projects across a wide range of industries. We have made many mistakes, and in the process we have learned what works well and where the common pitfalls lie. We wrote this book to share our experiences in hopes that it will help the reader – whether a data science practitioner or a business leader – reduce these risks and design projects that have the greatest chance of success. Much of the content in this guide is derived from lessons we have given to our students. Here we have gathered, organised and expanded on those bits of advice to serve as a resource for anyone considering embarking on a data science journey. We share our approach to data science projects, addressing topics such as alignment to business imperatives, project design, project delivery and evaluation of success. Data science can be an exciting, invigorating field, and for the business leader, it can bring about revolutionary changes to an organisation that can come with huge returns on investment and value added. For the data scientist, designing and delivering successful projects is rewarding, stimulating and tremendously gratifying. We hope this guide gives you the confidence to understand the risks and approach your project in a sensible way. Read more →

# Supervised Machine Learning

## by Michael Foley

These are my personal notes related to supervised machine learning techniques. […] Machine learning (ML) develops algorithms to identify patterns in data (unsupervised ML) or make predictions and inferences (supervised ML). Supervised ML trains the machine to learn from prior examples to predict either a categorical outcome (classification) or a numeric outcome (regression), or to infer the relationships between the outcome and its explanatory variables. Two early forms of supervised ML are linear regression (OLS) and generalized linear models (GLM) (Poisson and logistic regression). These … Read more →

# Modern R with the tidyverse

## by Bruno Rodrigues

This book will teach you how to use R to solve your statistical, data science and machine learning problems. Importing data, computing descriptive statistics, running regressions (or more complex machine learning models) and generating reports are some of the topics covered. No previous experience with R is needed. […] I have been working on this on and off for the past 4 years or so. In 2022, I have updated the contents of the book to reflect updates introduced with R 4.1 and in several packages (especially those from the {tidyverse}). I have also cut some content that I think is not that … Read more →

# _main.knit

## by © Asser Koskinen, Johan Båge and Håkan Lyckeborg

Data Analytics I - R Assignment Global Spread of COVID-19 […] This assignment deals with a “big” data set, and you are advised to use R for your analysis. R is a programming language and environment for data analytics, machine learning and computer science. You will use real COVID-19 data taken from the Our World in Data database, which is maintained by the Oxford Martin Programme on Global Development at Oxford University. The data has been divided into samples where you will have an individual dataset of 30 countries that you will … Read more →

# Machine Learning for Biostatistics

## by Armando Teixeira-Pinto

Machine Learning for Biostatistics […] In this module we will talk about model selection and regularisation methods (also called penalisation methods), namely, ridge and lasso. We will start with classical algorithms for model selection, such as the best subset selection and stepwise (backward and forward) selection. Then we introduce the idea of bias-variance trade-off and the motivation for ridge regression. Finally, we will talk about Lasso regression and some of its extensions. By the end of this module you should be able to: The dataset fat is available in the library(faraway). You have to … Read more →

# ISTA 321 - Data Mining

## by Nicholas DiRienzo

Course content for ISTA 321 - Last updated for Summer 2022 […] Welcome to ISTA 321 - Data Mining! The goal of this class is to teach you how to use R to make informed inferences and predictions from large datasets using a variety of methods. This requires a mixture of many skills including programming, data exploration and visualizations, statistics, algorithms, machine learning, model validation, and general data wrangling. We don’t do these things in isolation, but instead do them with a goal of answering a question, thus being able to apply this knowledge to make a data-driven decision … Read more →

# Econometría Espacial

## by victor_morales

Econometría Espacial […] Este documento está en construcción. Tiene por objetivo recoger material necesario para impartir un curso aplicado en econometría espacial. Otros contenidos elaborados por el autor: Machine Learning: Teoría y Práctica Series de Tiempo Métodos de evaluación de … Read more →

# MLFE R labs (2022 ed.)

## by Prof. Michela Cameletti & Tutor Marco Villa

Notes for the R labs of the MLFE course @ Unibg […] You are reading the lecture notes of the R labs for the Machine learning for Economics (MLFE) course at University of Bergamo (academic year 2021/22). The MLFE course is the second module of the Coding for Data Science course. The MLFE R labs are designed for students who already have some experience with R programming thanks to the first module of the Coding and Machine Learning course. Click here and here to access the R lab notes of the first module regarding introduction to R language and the tidyverse package. Enjoy the journey! … Read more →

# AI and Machine Learning For Finance 2021/22

## by Michela Cameletti

Notes for the R labs of the AIMLFF course @ Unibg […] You are reading the lecture notes of the R lectures for the AI and Machine Learning for Finance (AIMLFF) course at University of Bergamo (academic year 2021/22). See here for more details. In this notes R programming language for data science will be introduced (with respect to data manipulation, data visualization and communication and implementation of machine learning methods). For this part I suggest the following on-line book: Enjoy the journey! In the following lecture notes, this font (with grey background) represents R code. The … Read more →

# Supervised Machine Learning for Text Analysis in R

## by Emil Hvitfeldt and Julia Silge

This is the website for Supervised Machine Learning for Text Analysis in R! Visit the GitHub repository for this site, or buy a physical copy from CRC Press, Bookshop.org, or Amazon. This online work by Emil Hvitfeldt and Julia Silge is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International … Read more →

# Business Intelligence II - Anvendt Machine Learning

## by Mads Stenbo Nielsen

This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] Første modul omhandler en generel introduktion til softwareprogrammet R (https://www.r-project.org/). Her gengives en række eksempler på R-kode samt det tilhørende R-output for at illustrere forskellige grundlæggende funktionaliteter i programmet. Programmet R tager koder som input, så hver gang man skal have udført noget, skal man skrive de(n) relevante kode(r), som herefter eksekveres af programmet og giver det ... Read more →

# Probability

## by Michael Foley

Notes cobbled together from books, online classes, etc. to be used as quick reference for common work projects. […] These are notes from books, classes, tutorials, vignettes, etc. They contain mistakes, are poorly organized, and are sloppy on fundamentals. They should improve over time, but that’s all I can say for it. Use at your own risk. The focus of this handbook is probability, including random variables and probability distributions. Not included here: statistics, machine learning, text mining, survey analysis, or survival analysis. These subjects frequently arise at work, but are … Read more →

# Machine Learning for Imbalanced Datasets

## by Nana Boateng

This is a machine learning textbook for dealing with imbalanced datasets […] This is a sample book written in Markdown. You can use anything that Pandoc’s Markdown supports, e.g., a math equation (a^2 + b^2 = c^2). The bookdown package can be installed from CRAN or Github: {r eval=FALSE}install.packages(“bookdown”) # or the development version # devtools::install_github(“rstudio/bookdown”) Remember each Rmd file contains one and only one chapter, and a chapter is defined by the first-level heading #. To compile this example to PDF, you need XeLaTeX. You are recommended to install TinyTeX … Read more →

# Coding for Data Science 2021/22 - R part

## by Michela Cameletti

Notes for the R labs of the C4DS course @ Unibg […] You are reading the lecture notes of the R lectures for the Coding for Data Science (C4DS) course at University of Bergamo (academic year 2021/22). C4DS is the first module of the course named Coding and Machine Learning (see here for more details). The C4DS R lectures are designed for students who already have a programming background thanks to the first part of the C4DS course dedicated to Python. In this part of the module we will introduce R programming language for data science (including data manipulation, data visualization and … Read more →

# Portfolio, Churn & Customer Value

## by Hugo Cornet, Pierre-Emmanuel Diot, Guillaume Le Halper, Djawed Mancer

This research paper aims at modelling customer portfolio, churn and customer value. […] This paper is being realized as part of our last year in master’s degree in economics. It aims at studying a firm’s most valuable asset namely its customers. To that end, we adopt a quantitative approach based on econometrics and data analysis with a threefold purpose to : After having defined the subject’s key concepts, we apply duration models and machine learning techniques to a kaggle dataset related to customers of a fictional telecommunications service provider (TSP). Keywords: customer portfolio … Read more →

# BS0004 Code Annotations

## by Kevin, Kevin, and Kevin

This is a minimal example of using the bookdown package to write a book. The HTML output format for this example is bookdown::gitbook, set in the _output.yml file. [...] This document documents (no pun intended) my code and my workflow for the cancer dataset that Sean found a couple of days ago. I intend to - using the dataset - build several supervised machine learning classifiers to predict the status of cancer patients. I will then evaluate the performance of these models using the content taught in week 9; Furthermore, if possible, I think I will also try to perform GO term ... Read more →

# Unsupervised Machine Learning

## by Michael Foley

These are my personal notes related to unsupervised machine learning techniques. [...] Machine learning (ML) develops algorithms to identify patterns in data (unsupervised ML) or make predictions and inferences (supervised ML). Unsupervised machine learning searches for structure in unlabeled data (data without a response variable). The goal of unsupervised learning is clustering into homogenous subgroups, and dimensionality reduction. Examples of cluster analysis are k-means clustering, hierarchical cluster analysis (HCA), and PCA (others ... Read more →

# Machine Learning for Economics 2020/21: R labs

## by Michela Cameletti

Notes for the R labs of the MLFE course @ Unibg […] You are reading the lecture notes of the R labs for the Machine learning for Economics (MLFE) course at University of Bergamo (academic year 2020/21). The MLFE course is the second module of the Coding and Machine Learning course. The MLFE R labs are designed for students who already have some experience with R programming thanks to the first module of the Coding and Machine Learning course. Click here to access the R lab notes of the first module regarding introduction to R language and the tidyverse package. Enjoy the journey! … Read more →

# ADVANCED REGRESSION AND PREDICTION: MACHINE LEARNING TOOLS

## by Ilán F. Carretero Juchnowicz

This is a bookdown in which the second part of the project of the subject advanced regression and prediction of the Master’s Degree in Statistics for Data Science has been carried out […] Currently Machine Learning (ML) techniques are applied in an infinity of fields to obtain knowledge from data. Among these fields today we can highlight the appearance and effect of the coronavirus disease (COVID-19) in all aspects of society. That is why, by completing this second part of the advanced regression and prediction course, it is intended to use the techniques learned during the practical and … Read more →

# Tutorial

## by Golub Capital Social Impact Lab

This is a tutorial on machine learning-based causal inference. […] This tutorial will introduce key concepts in machine learning-based causal inference. It’s an ongoing project and new chapters will be uploaded as we finish them. A tentative list of topics that will be covered: Please note that this is currently a living document. Chapters marked as “beta” may change substantially and are in most need of feedback. If you find any issues, please feel free to contact Vitor Hadad at vitorh@stanford.edu. The “changelog” below will keep track of major updates and additions. We’ll illustrate key … Read more →

# Machine Learning Techniques

## by J.H. van der Zwan

Introductory text to a couple of commonly used Machine Learning techniques and how they are performed in R […] This book is intended to be a reference for frequently used Machine Learning techniques and how they can be performed in R. Writing this work started in summer 2020 and new chapters are and will be added over time. It is written with a view to students pursuing a Machine Learning course in a pre-graduate or graduate program. It is not intended to be an exhaustive textbook on all possible Machine Learning … Read more →

# Do A Data Science Project in 10 Days

## by Gangmin Li

This is a data science project practice book. It was initially written for my Big Data course to help students to run a quick data analytical project and to understand 1. the data analytical process, the typical tasks and the methods, techniques and the algorithms need to accomplish these tasks. During convid19, the unicersity has adopted on-line teaching. So the students can not access to the university labs and HPC facilities. Gaining an experience of doing a data science project becomes individual students self-learning in isolation. This book aimed to help them to read through it and follow instructions to complete the sample propject by themslef. However, it is required by many other students who want to know about data analytics, machine learning and particularly practical issues, to gain experience and confidence of doing data analysis. So it is aimed for beginners and have no much knowledge of data Science. the format for this book is bookdown::gitbook. Read more →

# Data Science for Human-Centered Product Design

## by Travis Kassab

Data Science for Human-Centered Product Design […] This is a data science tutorial with seven open-source projects that show how statistics and machine learning can be applied to user survey data. The purpose is not to prescribe techniques, but to demonstrate the use of data science in the context of product design. I’ve compiled what I know on the topic, and hope readers adopt some of these techniques and use them in concert with qualitative research and entrepreneurial thinking to build better products. Let’s quickly preview the seven different use-cases. First, we must develop an … Read more →

# Explanatory Model Analysis

## by Przemyslaw Biecek and Tomasz Burzykowski

This book introduces unified language for exploration, explanation and examination of predictive machine learning models. […] … Read more →

# ECO 397 Book Review

## by Clayton Engelby

ECO 397 Book Review […] Cathy O’Neill’s book “Weapons of Math Destruction” is an analysis of big data and its use of machine learning programs that aim to maximize market efficiency. In the process of doing so, she coins the initialism WMDs as logical flaws in the models that skew results in one way or another. Her argument focuses on the fact that more often than not, these failures result in the worsening of ongoing structural violence and only add fuel to the fire for recidivism rates, bankruptcies, mortgage defaults, college dropouts, and health-related deaths. While there is absolutely … Read more →

# Portfolio Construction

## by Jian SHEN

Portfolio construction with R […] outline: portfolio: basic portfolio concepts portfolio construction: back-testing machine learning: data clean, transform, viz, exploratory ts modeling model model evaluation math: convex optimization The R session information when compiling this book is shown below: Some basic knowledge about finance, time series analysis, optimization (linear and convex), programming (python1 or R) would be preferred. Later I will add corresponding python … Read more →

# A Minimal rTorch Book

## by Alfonso R. Reyes

This is a minimal tutorial about using the rTorch package to have fun while doing machine learning. This book was written with bookdown. […] Last update: Sun Oct 25 12:05:18 2020 -0500 (79503f6ee) You need couple of things to get rTorch working: Install Python Anaconda. Preferably, for 64-bits, and above Python 3.6+. I have successfully tested Anaconda under four different operating systems: Windows (Win10 and Windows Server 2008); macOS (Sierra, Mojave and Catalina); Linux (Debian, Fedora and Ubuntu); and lastly, Solaris 10. All these tests are required by CRAN. Install R, Rtools and … Read more →

# R for Statistical Learning

## by David Dalpiaz

Welcome to R for Statistical Learning! While this is the current title, a more appropriate title would be “Machine Learning from the Perspective of a Statistician using R” but that doesn’t seem as catchy. This book currently serves as a supplement to An Introduction to Statistical Learning for STAT 432 - Basics of Statistical Learning at the University of Illinois at Urbana-Champaign. The initial focus of this text was to expand on ISL’s introduction to using R for statistical learning, mostly through adding to and modifying existing code. This text is currently becoming much more … Read more →

# Modul Pelatihan Riset Kuantitatif Ekonomi dan Manajemen INSW

## by Tim Pengajar :, Dr. Bagus Sartono, M.Si dan Aep Hidayatuloh, S.Stat

Modul Pelatihan Riset Kuantitatif Ekonomi dan Manajemen bagi INSW sebagai bagian dari kajian Dwelling Time. […] Selamat datang para peserta pelatihan Riset Kuantitatif Ekonomi dan Manajemen. Dokumen ini merupakan bahan ajar praktikum untuk materi Riset Kuantitatif Ekonomi dan Manajemen. Pelatihan ini akan membahas pengaplikasian metode-metode statistika dan machine learning menggunakan software R. 03 Oktober 2020 30 September 2020 22 September 2020 20 September 2020 bagusco@apps.ipb.ac.id↩︎ aephidayatuloh.mail@gmail.com↩︎ … Read more →

# Aprendizaje automático de datos Colombianos (Machine Learning)

## by Danna Cruz

Este libro es una introducción aprendizaje automatico y aplicado a datos Colombianos […] Este libro fue construido para ayudar a entender algunas herramientas en R de Aprendizaje automático y aportar contenido en español. Además, queríamos aprovechar que actualmente, tenemos muchas bases de datos disponibles y de todo tipo. Quisiera fomentar el análisis de datos para todos los que quieran, es una práctica libre y cualquiera, con conocimientos básicos de R y Estadística, puede hacerlo. Fue escrito por Danna Cruz y Luis Alejandro Másmela y siempre acompañados por Cantelli. … Read more →

# An Introduction to Machine Learning with R

## by Laurent Gatto

An hands-on introduction to machine learning with R. […] This course material is aimed at people who are already familiar with the R language and syntax, and who would like to get a hands-on introduction to machine learning. This material is currently under development and is likely to change in the future. A set of packages that are used, either directly or indirectly are provided in the first chapter. A complete session information with all packages used to compile this document is available at the end. The source code for this document is available on GitHub at https://github.com/lgatto/IntroMac … Read more →

# EDP Sun Power prediction Challenge

## by Sergio Berdiales

EDP Sun Power prediction Challenge […] En este notebook estoy incluyendo el proceso de creación de los modelos con los que intentaré colarme en el ranking de participantes del Challenge de machine learning “Sun Power Prediction” que EDP tiene colgado en su web de open data y que incluyo a continuación (Fecha: 2019-11-14). “The objective of this competition is to build an algorithm that predicts the production of solar module B (with optimal orientation) for the first seven days of 2018. For this, you can rely on the weather station data for these days.” Hasta ahora sólo hay 7 participantes, … Read more →

# Hands-On Machine Learning with R

## by Bradley Boehmke & Brandon Greenwell

A Machine Learning Algorithmic Deep Dive Using R. […] This book is sold by Taylor & Francis Group, who owns the copyright. The physical copies are available at Taylor & Francis and Amazon. Welcome to Hands-On Machine Learning with R. This book provides hands-on modules for many of the most common machine learning methods to include: You will learn how to build and tune these various models with R packages that have been tested and approved due to their ability to scale well. However, our motivation in almost every case is to describe the techniques in a way that helps develop intuition for … Read more →

# The Open Quant Live Book

## by OpenQuants.com

The Open Quant Live Book […] The book aims to be an Open Source introductory reference of the most important aspects of financial data analysis, algo trading, portfolio selection, econophysics and machine learning in finance with an emphasis in reproducibility and openness not to be found in most other typical Wall Street-like references. The Book is Open and we welcome co-authors. Feel free to reach out or simply create a pull request with your contribution! See project structure, guidelines and how to contribute here. First published at: openquants.com. Licensed under Attribution-NonCommer … Read more →

# Make money with machine learning

## by Siraj Raval, revisited by Kim NOËL

This is a minimal example of using the bookdown package to write a book. The output format for this example is bookdown::gitbook. […] This book is the personnal transcription of the course provided by Siraj Raval. A drama related to content copyright stained for Raval during this course, and many students including me were disturbed. This event swelled a lot and the motivation to progress in this course was affected. So I decided to propose a version with more explanations and details. I will provide a list of tutorials to follow in order to complete this course. This is a book written in … Read more →

# Applications of Machine Learning in Imputation

## by Vinayak Anand-Kumar

This document presents the findings from the 2018/19 project into the use of machine learning in imputation. […] I would like to acknowledge the following people in helping produce this report: Emily Tew and Gareth Clews for their guidance and support, in getting XGBoost up and running. Fern Leather, for getting CANCEIS to work. Really grateful for taking the time to run through the specification files with me. Luke Lorenzi and Vahe Nafilyan, for helping me put the pieces together, and helping me figure out how we can progress this work in the context of survey data. Editing and … Read more →

# Agile Machine Learning with R

## by Edwin Thoen

A workflow for doing machine learning in the R language, using Agile principles. […] Not even too long ago, when I was starting my career as a data scientist, I did not really have a workflow. Freshly graduated from an applied statistics master I entered the arena of Dutch business, employed by a small consulting firm. Neither the company I was with, nor the clients I was working for, nor myself had an understanding of what it meant to implement a statistical model or a machine learning method in the real world. Everybody was of course interested in this “Big Data” thing, so it did not take … Read more →

# Decision-Driven Data Analytics for Well Placement Optimization in Field Development Scenario - Powered by Machine Learning

## by Peyman Kor

This is a minimal example of using the bookdown package to write a book. The output format for this example is bookdown::gitbook. […] Submitted in accordance with the requirements for the degree of Master of Science (M.Sc)in Petroleum EngineeringUniversity of Stavanger, Energy Resources Department The data, source code and algorithem of this thesis can be found in the author’s Github. Your feedback and comments will be appreciated and the author could be reached out via Linkedin, twitter. This thesis is licensed under Attribution-NonCommercial-ShareAlike 4.0 … Read more →

# Hackathon Talento - Reto 2 - Wind Farm

## by Sergio Berdiales, Javier Campos y Manuel Antonio García

Hackathon Talento - Reto 2 - Wind Farm […] Este notebook nace de nuestra participación el 4 de junio de 2019 como equipo en el Hackathon de Machine Learning organizado por Talento Corporativo y patrocinado por EDP, El Comercio, Clustertic y BigML. La competición consistió en el planteamiento de un par de retos de Machine Learning basados en datos de EDP y en los que había que utilizar la herramienta BIGml para ejecutar los modelos. El contenido de este notebook corresponde a la realización del segundo reto, cuyo planteamiento se describe en el apartado uno. Durante la competición la mayor … Read more →

# Hackathon Talento - Reto 1 - SUNLAB

## by Sergio Berdiales, Javier Campos y Manuel Antonio García

Hackathon Talento - Reto 1 - SUNLAB […] Este notebook nace de nuestra participación el 4 de junio de 2019 como equipo en el Hackathon de Machine Learning organizado por Talento Corporativo y patrocinado por EDP, El Comercio, Clustertic y BigML. La competición consistió en el planteamiento de un par de retos de Machine Learning basados en datos de EDP y en los que había que utilizar la herramienta BIGml para ejecutar los modelos. El contenido de este notebook corresponde a la realización del primer reto, cuyo planteamiento se describe en el apartado uno. Durante la competición la mayor parte … Read more →

# Machine Learning

## by Michael Clark

This document provides an introduction to machine learning for applied researchers. While conceptual in nature, demonstrations are provided for several common machine learning approaches of a supervised nature. In addition, all the R examples, which utilize the caret package, are also provided in Python via scikit-learn. […] … Read more →

# Predictive Soil Mapping with R

## by Tomislav Hengl and Robert A. MacMillan

Predictive Soil Mapping aims to produce the most accurate, most objective, and most usable maps of soil variables by using state-of-the-art Statistical and Machine Learning methods. This books explains how to implement common soil mapping procedures within the R programming language. […] This is the online version of the Open Access book: Predictive Soil Mapping with R. Pull requests and general comments are welcome. These materials are based on technical tutorials initially developed by the ISRIC’s Global Soil Information Facilities (GSIF) development team over the period 2014–2017. This book is … Read more →

# Predictive Soil Mapping with R

## by Tomislav Hengl and Robert A. MacMillan

Predictive Soil Mapping aims to produce the most accurate, most objective, and most usable maps of soil variables by using state-of-the-art Statistical and Machine Learning methods. This books explains how to implement common soil mapping procedures within the R programming language. […] This is the online version of the Open Access book: Predictive Soil Mapping with R. Pull requests and general comments are welcome. These materials are based on technical tutorials initially developed by the ISRIC’s Global Soil Information Facilities (GSIF) development team over the period 2014–2017. This book is … Read more →

# Machine Learning with Rust

## by Tae Geun Kim

This is a minimal example of using the bookdown package to write a book. The output format for this example is bookdown::gitbook. […] 최근들어 기계학습(Machine Learning)은 점차 중요해지고 있습니다. 학습된 기계들은 바둑이나 게임에서부터 프로들을 가뿐히 눌렀고, 연구나 업무를 훨씬 효율적으로 해결합니다. 그러나 단순히 모두가 한다고 해서 섣부르게 시작하다가는 결과가 나와도 해석하지 못하거나 혹은 애초에 잘못된 결과가 나올 수도 있습니다. 따라서 이 책에서는 단순히 Machine Learning Framework를 사용하는 것이 아닌, 밑바닥부터 차근차근 이론을 적용하여 Machine Learning을 학습하고자 합니다. 그러기 위해서 우리는 Rust라는 프로그래밍 언어와 매우 유명한 Bishop의 … Read more →

# useR! Machine Learning Tutorial

## by Erin LeDell

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive. […] useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive This tutorial contains training modules for six popular supervised machine learning methods: Here are some practical, related topics we will cover for each algorithm: Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded… Certain algorithms don’t scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all … Read more →

# Agile Data Science with R

## by Edwin Thoen

A workflow for doing data science in the R language, using Agile principles. […] When I was starting my career as a data scientist, I did not really have a workflow. Freshly out of statistics grad school I entered the arena of Dutch business, employed by a small consulting firm. Between the company, the potential clients and myself, no one knew what it meant to implement a statistical model or a machine learning method in the “real” world. But everybody was interested in this “Big Data” thing, so we quickly started to do consulting work without a clear idea what I was going to do. When we came … Read more →

# Data Science Live Book

## by Pablo Casas

An intuitive and practical approach to data analysis, data preparation and machine learning, suitable for all ages! […] This book is now available at Amazon. Check it out! 📗 🚀. Link to the black & white version, also available on full-color. It can be shipped to over 100 countries. 🌎 The book will facilitate the understanding of common issues when data analysis and machine learning are done. Building a predictive model is as difficult as one line of R code: That’s it. But, data has its dirtiness in practice. We need to sculp it, just like an artist does, to expose its information in order … Read more →

# Introduction to Data Science

## by Rafael A. Irizarry

This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning and helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with GitHub, and reproducible document preparation with R markdown. Read more →

# Lightweight Machine Learning Classics with R

## by Marek Gagolewski

Explore some of the most fundamental algorithms which have stood the test of time and provide the basis for innovative solutions in data-driven AI. Learn how to use the R language for implementing various stages of data processing and modelling activities. Appreciate mathematics as the universal language for formalising data-intense problems and communicating their solutions. The book is for you if you’re yet to be fluent with university-level linear algebra, calculus and probability theory or you’ve forgotten all the maths you’ve ever learned, and are seeking a gentle, yet thorough, introduction to the topic. Read more →

# Tidy Modeling with R

## by Max Kuhn and Julia Silge

The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles. This book provides a thorough introduction to how to use tidymodels, and an outline of good methodology and statistical practice for phases of the modeling process. […] Welcome to Tidy Modeling with R! This book is a guide to using a collection of software in the R programming language for model building called tidymodels, and it has two main goals: First and foremost, this book provides a practical introduction to how to use these specific R packages to create models. We … Read more →