Data Science Boot Camp
Weldon Cooper Center for Public Service, University of Virginia
June 8-10, 2021
Chapter 1 Overview
This boot camp is designed to help research assistants rapidly to become productive doing data science as a member of the Cooper Center team. The mini-course offers introductory training in how to do data science as a member of a team. It also provides an orientation to the projects, resources, and house styles that are specific to the Cooper Center.
1.1 Texts and resources
1.1.1 Core texts and resources
R4DS: R for Data Science by Hadley Wickham and Garrett Grolemund. An excellent introduction, available for free online.
Boot Camp Notes: Data Science Boot Camp by Arthur Small. These notes.
DataCamp. Online interactive courseware for data science, including many modules on R.
1.1.2 Supplemental texts and resources
Tidy Skills 4DS: Tidyverse Skills for Data Science by Carrie Wright, Shannon E. Ellis, Stephanie C. Hicks and Roger D. Peng.
Happy Git: Happy Git and GitHub for the useR by Jenny Bryan, the STAT 545 TAs, Jim Hester
1.2 Schedule of topics
Time | Topic | Resources | Example code |
---|---|---|---|
9:00–9:45 | Welcome | Cooper Center websites | |
9:45–10:30 | Data science at CCPS | Boot Camp Notes Ch. 2 | |
10:30–11:00 | – break – | ||
11:00–12:00 | A simple session in R Studio | R4DS Ch. 1, 2, 4, 6, 8 | cte-trailblazers.Rmd |
12:00–1:00 | – lunch break – | ||
1:00–1:45 | Importing data from flat files | ||
1:45–2:30 | Data types in R | ||
2:30–3:00 | – break – | ||
3:00–4:00 | Data wrangling I; tidy data |
June 9: Wrangling data with dplyr; visualizing data with ggplot2; importing data from databases, APIs; working with databases; good data management practices
Time | Topic | Resources | Example code |
---|---|---|---|
9:00–10:15 | Visualizing data with ggplot2 | ||
10:15–10:45 | – break – | ||
10:45–12:00 | Wrangling data with dplyr, Datatable | TSDS | |
12:00–1:00 | – lunch break – | ||
1:00–2:30 | Databases: overview; read/write; connections; credentials | ||
2:30–3:00 | – break – | ||
3:00–3:30 | Data management: good practices | ||
3:30–4:00 | Importing data from APIs, web resources |
June 10: Version control and file management with git and Github; working with larger files; using R Markdown to create documents and publications; creating dashboards with Shiny; cloud hosting
Time | Topic | Resources | Example code |
---|---|---|---|
9:00–10:00 | Version control with git and Github | Happy Git | |
10:00–10:30 | Creating dashboards with Shiny | ||
10:30–11:00 | – break – | ||
11:00–12:00 | Cloud hosting | ||
12:00–1:00 | – lunch break – | ||
1:00–2:00 | File management: good practices | ||
2:00–2:30 | Using R Markdown to create documents and publications | ||
2:30–3:00 | – break – | ||
3:00–3:30 | Working with larger files | ||
3:30–4:00 | Wrap up |
References
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.org/knitr/.
Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://github.com/rstudio/bookdown.