Text Mining with R

by Julia Silge and David Robinson

2024-06-20

A guide to text analysis within the tidy data framework, using the tidytext package and other tidy tools […] This is the website for Text Mining with R! Visit the GitHub repository for this site, find the book at O’Reilly, or buy it on Amazon. This work by Julia Silge and David Robinson is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. … Read more →

1

The Data Preparation Journey

by Martin Monkman

2024-02-26

Before you can analyze your data, you need to ensure that it is clean and tidy. […] Welcome to The Data Preparation Journey: Finding Your Way With R, a book published with CRC Press as part of The Data Science Series. This is a work-in-progress; the most recent update is 2024-02-25. It is routinely noted that the Pareto principle applies to data science—80% of one’s time is spent on data collection and preparation, and the remaining 20% on the “fun stuff” like modelling, data visualization, and communication. There is no shortage of material—textbooks, journal articles, blog posts, online … Read more →

2

Introduction to Inferential Statistics

by Dr. Marc Trussler

2024-01-05

Class notes for PSCI-1801 […] Current as of 2024-01-05 Lecture: MW 12-1:30pm (MCNB 309) Dr. Marc Trussler trussler@sas.upenn.edu Fox-Fels Hall 32 (3814 Walnut Street) Office Hours: M 9-11am TA: Dylan Radley dradley@sas.upenn.edu Fox-Fels Hall 35 (3814 Walnut Street) Office Hours: Tuesday 11-12 Tuesday 3-4 Thursday 12-1 The first step of many data science sequences is to learn a great deal about how to work with individual data sets: cleaning, tidying, merging, describing and visualizing data. These are crucial skills in data analytics, but describing a data set is not our ultimate goal. The … Read more →

3

Unlocking the power of data visualization with R - Unlocking the Power of Data Visualization with R

by fede_gazzelloni

2023-12-01

A full list of Data Visualizations with code made with the R programming language. Welcome to Unlocking the Power of Data Visualization with R, where I proudly showcase my contributions to the #R4DS community through the #TidyTuesday, #30DayChartChallenge, and #30DayMapChallenge competitions, for 2021, 2022, and 2023. This platform is your gateway to data exploration, featuring a diverse collection of data visualizations created using the R programming language. Take a deep dive into the digital gallery, click on the image to discover insights, find inspiration, and learn from detailed … Read more →

4

数据科学中的 R 语言

by 王敏杰

2023-07-18

This book is an overview of how practitioners can acquire, wrangle, visualize, and model data with the R and Stan. […] 你好，这里是四川师范大学研究生公选课《数据科学中的R语言》的课程内容。R语言是统计编程的第一语言，近几年Tidyverse的推出大大降低了R语言的学习难度。Tidyverse是一系列R包的集合，包含了dplyr、ggplot2、tidyr、stringr等，从数据导入预处理，再到高级转化、可视化、建模和展示。因为其代码清晰可读的编程风格，得到越来越多人的喜爱。考虑到大家来自不同的学院，有着不同的学科背景，因此讲授的内容不会太深奥（要有信 … Read more →

5

tidy[ing] up POL345

by John Kim

2022-11-13

A guide to the tidyverse for POL345 Students. […] POL345 is often Princeton students’ first foray into the programming language R. Through POL345, students gain an introductory overview of R, and programming generally, to conduct basic data analysis on their own. However, many further courses (SML201, SOC306, POL346), along with industry users of R, use the tidyverse instead, a “language” within R to conduct clean, readable data analysis. This book seeks to bridge that gap, revisiting each of the POL345 handouts using the tidyverse to introduce students to this “language within a language”. … Read more →

6

An Introduction to Statistical Learning with the tidyverse and tidymodels

by Taylor Dunn

2022-10-24

Working through ISLR with the tidyverse and tidymodels […] I am a data scientist and statistician who is (mostly) self-taught from textbooks and generous people sharing their work online. Inspired by projects like Solomon Kurz’s recoding of Statistical Rethinking and Emil Hvitfeldt’s ISLR tidymodels labs, I decided to publicly document my notes and code as I work through An Introduction to Statistical Learning, 2nd edition by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. I prefer to work with the tidyverse collection of R packages, and so will be using those to wrangle … Read more →

7

Data Integration, Manipulation and Visualization of Phylogenetic Trees

by Guangchuang Yu

2022-07-13

Data Integration, Manipulation and Visualization of Phylogenetic Trees

Master ggtree package suite to handle tree with data. […] I am so excited to have this book published. The book is meant as a guide for data integration, manipulation and visualization of phylogenetic trees using a suite of R packages, tidytree, treeio, ggtree and ggtreeExtra. Hence, if you are starting to read this book, we assume you have a working knowledge of how to use R and ggplot2. The development of the ggtree package started during my PhD study at the University of Hong Kong. I joined the State Key Laboratory of Emerging Infectious Diseases (SKLEID) under the supervision of Yi Guan … Read more →

8

Úvod do analýzy údajov pomocou R

by Tomáš Bacigál

2022-05-11

Základy jazyka R a úvod do Data Science: prieskumná analýza, transformácia údajov (dplyr), vizualizácia (ggplot2), čistenie údajov (tidyr), interaktívne grafy (htmlwidgets, shiny, …), komunikácia (RMarkdown), efektívne programovanie (parallel, RCpp, RSQLite). […] I don’t think anyone actually believes that R is designed to make everyone happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. (Roger D. Peng, r. 2004) Tento citát vystihuje univerzálnosť softvérového nástroja R1. Tak ako u viacerých „open … Read more →

9

An(other) introduction to R

by Felix Lennert

2022-02-07

This is a gentle introduction to R and the basic usage of some tidyverse packages (dplyr, tidyr, ggplot2, forcats, stringr) for data manipulation and visualization. […] Dear student, in the following, you will receive a gentle introduction to R and how you can use it to work with data. This tutorial was heavily inspired by Richard Cotton’s “Learning R” (Cotton 2013) and Hadley Wickham’s and Garrett Grolemund’s “R for Data Science” (abbreviated with R4DS). The latter can be found online (Wickham and Grolemund 2016). We will not immediately start out with the packages from the tidyverse … Read more →

10

Data Skills for Reproducible Science

by psyteachr.github.io

2021-08-22

This course provides an overview of skills needed for reproducible research and open science using the statistical programming language R. Students will learn about data visualisation, data tidying and wrangling, archiving, iteration and functions, probability and data simulations, general linear models, and reproducible workflows. Learning is reinforced through weekly assignments that involve working with different types of data. Read more →

11

Lecture 2 Note

by Yiming

2021-07-17

This is the class note for lecture 2. […] There is an extensive range of packages in R. For collecting and analyzing financial time series, some of the packages we will use include: Financial data collection from internet (tidyquant) Time series (xts,zoo) Non-linear volatility models (rugarch) Regime modeling (fxregime) The first package (tidyquant) facilitates collecting financial data from the internet sites: https://fred.stlouisfed.org/(FRED) Interest rates of US Government Bonds/Bills Interest rates of Foreign governments Foreign exchange rates Commodities: West-Texas-Intermediate Crude … Read more →

12

Introduction to R (Part 2)

by Nana Kim

2021-02-16

A document for Intro to R workshop (part 2) video […] In the next two chapters, we will learn how to manipulate and visualize data. We will use tidyverse packages (mainly dplyr, ggplot2, and tidyr) for easier and faster data manipulation/visualization. First, install and load the tidyverse by running: * Visit https://www.tidyverse.org/ to learn more about the … Read more →

13

三國志で学ぶデータ分析 (Japan.R 2019)

by ill-identified

2021-01-14

“三国志を題材にしたRを使ったデータ分析のチュートリアル” […] この記事は 2019/12/7 に開催された Japan.R の発表原稿をもとに作成した資料である. この記事の目的は2つ. ここでいう「データ分析」とは, なるべく複雑高度なテクニックを乱用せず必要最小限の方法で何かを言おうというものである. 今回の「データ分析」はスクレイピングによるデータ取得, データの加工整形, 要約統計量の計算, グラフによる視覚化, というよくあるデータ分析のアプローチであり, 使っているパッケージもrvest(スクレイピング), tidyrと dplyr(データの加工整形),ggplot2(グラフ作成)など様々な場面で使われるRの代表的なパッケージばかりで, … Read more →

14

The Covid19R Project Documentation

by Rami Krispin, Amanda Dobbyn, Jarrett Byrnes

2020-06-20

The Covid19R Project Documentation […] The Covid19R Project is an attempt to provide a set of standards for creating tidy Covid-19 related data distribution packages as well as a centralized method for then redistributing the data sets themselves within R. We’re trying to build a community with interoperable data standards in order to allow anyone using R to derive novel insights about this. global pandemic. In the documentation for the Covid19R Project, we detail what the project is, how to be a part of it, and what minimal tidy standards we want data packages to conform to in order to … Read more →

15

TidySimStat

by Edward J. Xu

2020-05-15

Stochastic Simulation and Statistics in Tidyverse. […] This is the website hosting all the theories and and practices regarding stochastic simulation and statistics. It has the following … Read more →

16

Notes for “Text Mining with R: A Tidy Approach”

by Qiushi Yan

2020-05-10

Notes for “Text Mining with R: A Tidy Approach” […] This is a notebook concerning Text Mining with R: A Tidy Approach(Silge and Robinson 2017). tidyverse and tidytext are automatically loaded before each chapter: I have defined a simiple function, facet_bar() to meet the frequent need in this book to make a facetted bar plot, with the y variable reordered by x in each facet by: As a quick demostration of this function, we can plot the top 10 common words in Jane Austen’s six books: … Read more →

17

Introduction to tidyvpc

by James Craig

2020-04-07

VPC Percentiles and Prediction Intervals […] Install devtools if not previously installed. If there are errors (converted from warning) during installation related to packages built under different version of R, they can be ignored by setting the environment variable R_REMOTES_NO_ERRORS_FROM_WARNINGS=“true” before calling … Read more →

18

Grand Slam Heroes

by Ganesh Viswanathan and Roma Dutta

2019-12-09

Grand Slam Heroes […] “The Open Era is the current era of professional tennis. It began in 1968 when the Grand Slam tournaments allowed professional players to compete with amateurs, ending the division that had persisted since the dawn of the sport in the 19th century.” - Wikipedia Github Link: https://github.com/ganesh2512/finalProject Rstudio Cloud Link: https://rstudio.cloud/project/704614 Bookdown Link: https://bookdown.org/rdutta4/bookdown-grandslam/ ShinyAppsIOLink: https://ganesh-viswanathan.shinyapps.io/finalProjectShiny/ Data source Link: https://github.com/rfordatascience/tidytues … Read more →

19

Tidy Portfoliomanagement in R

by Sebastian Stöckl

2018-09-21

First try on a book on tidy Portfolio Managment in R. […] This book should accompany my lectures “Research Methods”, “Quantitative Analysis”, “Portoliomanagement and Financial Analysis” and (to a smaller degree) “Empirical Methods in Finance”. In the past years I have been a heavy promoter of the Rmetrics tools for my lectures and research. However, in the last year the development of the project has stagnated due to the tragic death of its founder Prof. Dr. Diethelm Würtz. It therefore happened several times that code from past semesters and lectures has stopped working and no more support … Read more →

20

ISLR tidymodels labs

by Emil Hvitfeldt

2024-07-25*

Emil Hvitfeldt This book aims to be a complement to the 2nd edition An Introduction to Statistical Learning book with translations of the labs into using the tidymodels set of packages. The labs will be mirrored quite closely to stay true to the original material. All listed changes will be relative to the 1st edition. This book was written in RStudio using Quarto. The website is hosted via GitHub Pages, and the complete source is available on GitHub. This version of the book was built with R version 4.3.1 (2023-06-16) and Quarto version 1.4.104 and the following … Read more →

21

Spreadsheet Munging Strategies

by Duncan Garmonsway

2024-07-25*

Spreadsheet Munging Strategies […] This is a work-in-progress book about getting data out of spreadsheets, no matter how peculiar. The book is designed primarily for R users who have to extract data from spreadsheets and who are already familiar with the tidyverse. It has a cookbook structure, and can be used as a reference, but readers who begin in the middle might have to work backwards from time to time. R packages that feature heavily are Tidyxl and unpivotr are much more complicated than readxl, and that’s the point. Tidyxl and unpivotr give you more power and complexity when you need it. … Read more →

22

Tidy Modeling with R

by Max Kuhn and Julia Silge

2024-07-25*

The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles. This book provides a thorough introduction to how to use tidymodels, and an outline of good methodology and statistical practice for phases of the modeling process. […] Welcome to Tidy Modeling with R! This book is a guide to using a collection of software in the R programming language for model building called tidymodels, and it has two main goals: First and foremost, this book provides a practical introduction to how to use these specific R packages to create models. We … Read more →

23

Tidy tools for supporting fluent workflow in temporal data analysis

by Earo Wang

2024-07-25*

This is the website for my PhD thesis at Monash University (Australia), titled “Tidy tools for supporting fluent workflow in temporal data analysis”. … Read more →

24

Welcome to Tidy Finance

2024-07-25*

An opinionated approach on empirical research in financial economics using the programming languages R and Python. […] Tidy Finance is an opinionated approach to empirical research in financial economics - a fully transparent, open-source code base in multiple programming languages. A clean coding environment is a prerequisite for building a relevant investment platform and conducting meaningful factor research. Tidy Finance is the name of the game, giving aspiring academics and finance practitioners just what they need to perform clean and reproducible research. Highly recommended. Harald … Read more →

25