Chapter 1 Overview

This boot camp is designed to help research assistants rapidly to become productive doing data science as a member of the Cooper Center team. The mini-course offers introductory training in how to do data science as a member of a team. It also provides an orientation to the projects, resources, and house styles that are specific to the Cooper Center.

1.1 Texts and resources

1.1.1 Core texts and resources

R4DS: R for Data Science by Hadley Wickham and Garrett Grolemund. An excellent introduction, available for free online.

Boot Camp Notes: Data Science Boot Camp by Arthur Small. These notes.

DataCamp. Online interactive courseware for data science, including many modules on R.

1.1.2 Supplemental texts and resources

Tidy Skills 4DS: Tidyverse Skills for Data Science by Carrie Wright, Shannon E. Ellis, Stephanie C. Hicks and Roger D. Peng.

Happy Git: Happy Git and GitHub for the useR by Jenny Bryan, the STAT 545 TAs, Jim Hester

1.2 Schedule of topics

June 8: Welcome; orientation to data science at the Cooper Center; doing data science as part of a team; concepts of reproducible research and literate programming; working in R Studio; importing data; tidy data
Time Topic Resources Example code
9:00–9:45 Welcome Cooper Center websites
9:45–10:30 Data science at CCPS Boot Camp Notes Ch. 2
10:30–11:00 break
11:00–12:00 A simple session in R Studio R4DS Ch. 1, 2, 4, 6, 8 cte-trailblazers.Rmd
12:00–1:00 lunch break
1:00–1:45 Importing data from flat files
1:45–2:30 Data types in R
2:30–3:00 break
3:00–4:00 Data wrangling I; tidy data

June 9: Wrangling data with dplyr; visualizing data with ggplot2; importing data from databases, APIs; working with databases; good data management practices

Time Topic Resources Example code
9:00–10:15 Visualizing data with ggplot2
10:15–10:45 break
10:45–12:00 Wrangling data with dplyr, Datatable TSDS
12:00–1:00 lunch break
1:00–2:30 Databases: overview; read/write; connections; credentials
2:30–3:00 break
3:00–3:30 Data management: good practices
3:30–4:00 Importing data from APIs, web resources

June 10: Version control and file management with git and Github; working with larger files; using R Markdown to create documents and publications; creating dashboards with Shiny; cloud hosting

Time Topic Resources Example code
9:00–10:00 Version control with git and Github Happy Git
10:00–10:30 Creating dashboards with Shiny
10:30–11:00 break
11:00–12:00 Cloud hosting
12:00–1:00 lunch break
1:00–2:00 File management: good practices
2:00–2:30 Using R Markdown to create documents and publications
2:30–3:00 break
3:00–3:30 Working with larger files
3:30–4:00 Wrap up

1.3 Credits

These course materials were generated using the bookdown package (Xie 2020), which was built on top of R Markdown and knitr (Xie 2015).

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.org/knitr/.

Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://github.com/rstudio/bookdown.