i2ds
Welcome
This course
Coordinates
Goals
Audience and Preconditions
Requirements
Effort
Assessment
Our data
Preparations
Software
Reading
Writing
Part 1: Foundations
1
What and why?
1.1
Introduction
1.2
Reflections
1.2.1
Data
1.2.2
Science
1.2.3
Data science
1.2.4
Skills
1.2.5
Tools
1.2.6
Fit for tasks
1.3
Reproducible research
1.3.1
What?
1.3.2
Why?
1.3.3
How?
1.3.4
R Markdown
1.4
Conclusion
1.4.1
Summary
1.4.2
Resources
1.4.3
Preview
1.5
Exercises
1.5.1
Create a summary in R Markdown
1.5.2
Create a data file
1.5.3
Data vs. representations
1.5.4
Representing music
1.5.5
Conceptual summary
2
R basics
2.1
Introduction
2.1.1
Contents
2.1.2
Data and tools
2.2
Defining R objects
2.2.1
Data objects
2.2.2
Data types
2.3
Vectors
2.3.1
Creating vectors
2.3.2
Accessing and changing vectors
2.4
Functions
2.4.1
Function arguments
2.4.2
Exploring functions
2.4.3
Practice
2.5
Conclusion
2.5.1
Summary
2.5.2
Resources
2.5.3
Preview
2.6
Exercises
2.6.1
The ER of R
2.6.2
Data types and forms
2.6.3
Exploring a function
2.6.4
Cumulative savings
2.6.5
Vector arithmetic
2.6.6
Cryptic arithmetic
3
R data structures
3.1
Introduction
3.1.1
Contents
3.1.2
Data and tools
3.2
Overview
3.3
Linear data structures
3.3.1
Atomic vectors
3.3.2
Lists
3.4
Rectangular data structures
3.4.1
Matrices
3.4.2
Data frames
3.4.3
Practice
3.5
Multidimensional data structures
3.6
Conclusion
3.6.1
Summary
3.6.2
Resources
3.6.3
Preview
3.7
Exercises
3.7.1
Why and when which data structure?
3.7.2
Saving data as a list
3.7.3
Manipulating matrices
3.7.4
Survey age
3.7.5
Exploring participant data
4
Visualize
4.1
Introduction
4.1.1
Contents
4.1.2
Data and tools
4.2
Reflections
4.2.1
Why visualize?
4.2.2
Evaluating visualizations
4.2.3
Types of graphs
4.2.4
Plotting in base R
4.3
Basic plots
4.3.1
Histograms
4.3.2
Scatterplots
4.3.3
Bar plots
4.3.4
Box plots
4.3.5
Curves and lines
4.3.6
Other plots
4.4
Complex plots
4.4.1
Composing plots as programs
4.4.2
Starting a new plot
4.4.3
Annotating plots
4.4.4
Useful combinations
4.4.5
Setting graphical parameters (
par()
)
4.5
Conclusion
4.5.1
Summary
4.5.2
Resources
4.5.3
Preview
4.6
Exercises
4.6.1
Good vs. bad examples
4.6.2
Plot types
4.6.3
Plotting the Nile
4.6.4
Plotting a histogram
4.6.5
Plotting a scatterplot
4.6.6
Plotting bar plots (of election results)
4.6.7
Plotting air quality data
Bonus exercises
4.6.8
Bonus: Plotting curves (for getting even with percentage changes)
4.6.9
Bonus: Anscombe’s quartet
4.6.10
Bonus: Re-creating complex plots
Part 2: Data wrangling
5
Getting data
5.1
Introduction
5.2
Importing data
5.2.1
File locations and paths
5.2.2
Reading and writing files
5.3
Creating tibbles
5.4
Conclusion
5.4.1
Summary
5.4.2
Resources
5.4.3
Preview
5.5
Exercises
Part A: Reading and writing data
5.5.1
Navigating directories
5.5.2
Read-write-read cycle
5.5.3
Reading odd data
5.5.4
Variants of
p_info
Part B: Creating tibbles
5.5.5
Flower power
5.5.6
Rental accounting
5.5.7
False positive psychology
6
Transforming data
6.1
Introduction
6.1.1
Reflection: Same or different data?
6.1.2
Key concepts
6.2
The pipe from
magrittr
6.2.1
The function of pipes
6.2.2
Example pipes
6.3
Transforming tables with
dplyr
6.3.1
The function of pliers
6.3.2
Essential
dplyr
functions
6.3.3
Answering questions (with
dplyr
and
ggplot2
)
6.4
Transforming tables with
tidyr
6.4.1
What is tidy data?
6.4.2
Essential
tidyr
functions
6.4.3
Unite variables
6.4.4
Separate variables
6.4.5
Making tables longer
6.4.6
Making tables wider
6.5
Conclusion
6.5.1
Summary
6.5.2
Resources
6.5.3
Preview
6.6
Exercises
Part A: Transforming tables (dplyr)
6.6.1
Reshaping vs. reducing data
6.6.2
Star and R wars
6.6.3
Sleeping mammals
6.6.4
Revisiting positive psychology
6.6.5
Surviving the Titanic
Part B: Reshaping tables (tidyr)
6.6.6
Four messes and one tidy table
6.6.7
Moving stocks (from wide to long to wide)
6.6.8
Plotting relatives
6.6.9
Widening rental accounting
7
Exploring data
7.1
Introduction
7.1.1
What is EDA?
7.2
Essentials
7.2.1
The principles of EDA
7.2.2
Caveat: Explaining vs. predicting in science
7.3
Conclusion
7.3.1
Summary
7.3.2
Resources
7.3.3
Preview
7.4
Exercises
7.4.1
Exploring wide data
7.4.2
Selective dropouts
7.4.3
Effects of income
7.4.4
Main findings
Part 3: Special data types
8
Numbers and factors
8.1
Introduction
8.2
Essentials
8.2.1
Numbers
8.2.2
Factors
8.3
Conclusion
8.3.1
Summary
8.3.2
Resources
8.3.3
Preview
8.4
Exercises
8.4.1
Title
9
Text data
9.1
Introduction
9.2
Essentials
9.3
Conclusion
9.3.1
Summary
9.3.2
Resources
9.3.3
Preview
9.4
Exercises
9.4.1
Escaping into Unicode
9.4.2
Pasting vectors
9.4.3
Searching color names
9.4.4
Patterns in pi
9.4.5
Naive cryptography
9.4.6
Known unknowns
10
Dates and times
10.1
Introduction
10.2
Essentials
10.3
Conclusion
10.3.1
Summary
10.3.2
Resources
10.3.3
Preview
10.4
Exercises
10.4.1
Title
Part 4: Programming
11
Conditionals
11.1
Introduction
11.2
Essentials
11.2.1
Basic conditionals
11.2.2
Advanced conditionals
11.2.3
Avoiding conditionals
11.3
Conclusion
11.3.1
Summary
11.3.2
Resources
11.3.3
Preview
11.4
Exercises
11.4.1
Title
12
Functions
12.1
Introduction
12.1.1
Terminology
12.1.2
The function of functions
12.2
Essentials of writing R functions
12.2.1
Basics
12.2.2
Advanced aspects of functions
12.3
Conclusion
12.3.1
Summary
12.3.2
Resources
12.3.3
Preview
12.4
Exercises
12.4.1
Fun with errors
12.4.2
Randomizers revisited
12.4.3
Tibble charts
12.4.4
A plotting function
12.4.5
Printing numbers as characters
13
Iteration
13.1
Introduction
13.2
Essentials
13.3
Conclusion
13.3.1
Summary
13.3.2
Resources
13.3.3
Preview
13.4
Exercises
13.4.1
Fibonacci loops
13.4.2
Looping for divisors
13.4.3
Dice loops
13.4.4
Cumulative savings revisited
Part 5: Applications
14
Modeling
14.1
Essentials of modeling
14.1.1
Terminology
14.1.2
Model types vs. levels
14.1.3
Examples
14.1.4
Goals
14.1.5
Evaluating models
14.1.6
Summary
14.2
Conclusion
14.2.1
Summary
14.2.2
Resources
14.3
Exercises
14.3.1
Model revolutions
14.3.2
Miniature model
14.3.3
An almost perfect model
14.3.4
A vague verbal model
14.3.5
Modeling samples of famous people
15
Basic simulations
15.1
Introduction
15.1.1
What are simulations?
15.1.2
Static simulations
15.1.3
Overview
15.2
Enumerating cases
15.2.1
Bayesian situations
15.2.2
Analysis
15.2.3
Solution by enumeration
15.2.4
Visualizing Bayesian situations
15.3
Sampling cases
15.3.1
The Monty Hall problem
15.3.2
Analysis
15.3.3
Representing the environment
15.3.4
Abstract solution
15.3.5
Detailed solution
15.3.6
Visualizing simulation results
15.4
Conclusion
15.4.1
Summary
15.4.2
Resources
15.5
Exercises
15.5.1
More Bayesian situations
15.5.2
Solving Bayesian situations by sampling
15.5.3
Monty Hall reloaded
15.5.4
The sock drawer puzzle
15.5.5
Related probability puzzles
15.5.6
Coin tossing reloaded
16
Dynamic simulations
16.1
Introduction
16.1.1
What is dynamic?
16.1.2
Topics and models addressed
16.1.3
Models of agents, environments, and interactions
16.2
Models of learning
16.3
Dynamic environments
16.3.1
Multi-armed bandits (MABs)
16.4
Evaluating dynamic models
16.4.1
Heuristics vs. optimal strategies
16.4.2
Benchmarking strategy performance
16.5
Conclusion
16.5.1
Summary
16.5.2
Resources
16.6
Exercises
16.6.1
Learning with short-term aspirations
16.6.2
Learning with additional options
16.6.3
A foraging MAB
16.6.4
Maximization vs. melioration
17
Social situations
17.1
Introduction
17.1.1
What are social situations?
17.1.2
Models of social situations
17.2
Games
17.2.1
Terminology
17.2.2
Game types
17.2.3
Learning in games
17.3
Social learning
17.3.1
Replicator dynamics
17.4
Social networks
17.5
Conclusion
17.5.1
Summary
17.5.2
Resources
17.6
Exercises
17.6.1
Reflecting on Pac-Man
17.6.2
Learning in coordination and conflict games
17.6.3
Playing rock-paper-scissors (RPS)
17.6.4
Sequential mate search
18
Prediction
18.1
Introduction
18.2
Essentials of prediction
18.2.1
Types of tasks
18.2.2
Evaluating predictive success
18.2.3
Baseline performance and other benchmarks
18.3
Quantitative prediction tasks
18.3.1
Introduction
18.3.2
A basic model
18.3.3
Linear models
18.3.4
Comparing models
18.3.5
Multiple predictors
18.3.6
Older parts
18.3.7
ToDo 2
18.4
Qualitative prediction tasks
18.4.1
ToDo 1
18.4.2
Trees
18.5
Conclusion
18.5.1
Summary
18.5.2
Beware of biases
18.5.3
Resources
18.6
Exercises
18.6.1
Predicting game outcomes in sports
19
R pour l’art
19.1
Introduction
19.1.1
Encoded art?
19.1.2
Goals
19.2
Visualizing structure
19.2.1
Circles and squares
19.2.2
Mathematical patterns
19.2.3
Chaotic structures
19.3
Plotting text
19.3.1
The task
19.3.2
Plot text
19.3.3
Adding counts and colors
19.3.4
Extensions
19.4
Conclusion
19.4.1
Summary
19.4.2
Resources
19.5
Exercises
19.5.1
Visualizing mathematical puzzles
19.5.2
Visualizing word frequency
19.5.3
Creative freedom
Appendix
A
Your data science project
Desiderata
Ideas
Resources
About
Contents and audience
Providing feedback
Linking and citing
License
Colophon
References
Published with bookdown
Introduction to Data Science
10.1
Introduction