Exploratory Visual Data Analysis
Preface
0.1
Pedigogical Plan
0.2
Data Sources
0.3
Introduction
1
What is Data?
1.1
Attributes of Reliable Data
1.1.1
Collected in “good faith”
1.1.2
Representative
1.1.3
Sufficient
1.2
Categories of Data Quality
1.2.1
Anecdotes & Deceptive Evidence
1.2.2
Ad-Hoc Data
1.2.3
Purposefully Designed Studies
1.3
Examples
1.3.1
A Washingtion Post / ABC News Survey
1.4
Exercises
2
Data Variablity
2.1
Individual Chain Reasoning
2.2
Population Trends Reasoning
2.3
Exercises
3
Data Sources
3.1
Government Collected Data
3.1.1
US Census Bureau
3.1.2
US NOAA
3.1.3
US Bureau of Labor Statistics
3.1.4
Data.gov
3.2
Known but not government collected
3.3
Exercises
4
Data Manipulation
4.1
Introduction
4.1.1
Import
4.1.2
Tidying
4.1.3
Cleaning
4.1.4
Use
4.2
Fundamental Actions
4.2.1
Sorting
4.2.2
Subsetting
4.2.3
New Column via from Existing Column(s)
4.2.4
Aggregation
4.2.5
Unions
4.2.6
Joins
4.2.7
Pivots
4.3
Combining Actions
4.3.1
Example: Column Splitting
4.3.2
Example: Calculating Percentages
4.4
Using Software
4.4.1
Excel
4.4.2
Tableau
4.5
Exercises
5
Basic Graphs
5.1
Creating a basic graph
5.2
Light pre-processing and adjusting labels
5.3
Exporting the graph to MS Word
5.4
Selecting EPTs is done using the Marks pane
5.5
Creating Histograms, Boxplots, and Regression Lines
5.6
Exercises
6
Graphing Principles
6.1
Elementary Perception Tasks
6.2
Groupings / Gestalt
6.2.1
Grouping Effects
6.2.2
Grouping Examples
6.2.3
Example: Warpbreaks
6.2.4
Example - Federal Spending over Time
6.3
“Color” Scales
6.4
Examples
6.4.1
RobinHood App
6.4.2
Coffee Varieties & Origins
6.4.3
Trade with Britain
6.5
Exercises
7
A Selection of Graph Examples
7.1
Introduction
7.1.1
Example 1
7.1.2
Example 2
7.1.3
Example 3
7.2
Proportions
7.2.1
Single Set
7.3
Multiple Sets of Proportions
7.3.1
Faceted Bar charts
7.3.2
Side-by-Side Stacked Barcharts
7.3.3
Mosiac plots
7.3.4
Alluvial Plots
7.3.5
Tree graphs
7.4
Exercises
8
Plotting with aggregation
8.1
Univariate
8.1.1
Small samples
8.1.2
Histograms
8.1.3
Density plots
8.1.4
Faceting
8.1.5
Stacking
8.1.6
Overlapping curves
8.2
Bivariate (one continuous, one categorical)
8.2.1
Box plots
8.2.2
Ridge Plots
8.2.3
Violin Plots
8.3
Bivariate (two continuous)
8.3.1
Scatter plots
8.3.2
Pairs plots (All-vs-all scatterplots)
8.3.3
Correlation Plots
8.3.4
Overplotting
8.3.5
Regression Lines
8.4
Plot building process
8.5
Exercises
9
Geographic
9.1
Projections
9.2
Map Layers
9.3
Chloropleths
9.4
Chloropleths in Tableau
9.5
Role of Geography in Maps
9.6
Many ways a map can mislead
9.7
Exercises
10
Data Journalism
10.1
Introduction
10.2
Audience
10.3
Outline
10.4
Figures
10.5
Presentations
10.6
Exercises
11
Interactive Graphics
11.1
Introduction
11.2
Examples
11.2.1
Covid-19 Example
11.2.2
A history of World Civilizations
11.2.3
Rainforest Biodiversity
11.2.4
Oil Rigs in Texas
11.2.5
Women in Politics
11.2.6
Taxi Rides in NYC
11.3
Deeper Thoughts
11.3.1
Geographic Aggregation Level
11.3.2
Miscellaneous Aggregation Levels
11.3.3
The “Me” Layer
11.4
Tableau Example
11.4.1
Data
11.4.2
Tableau
11.5
Exercises
12
Dashboards
13
Malicious Uses of Data
13.1
Dual Y-Axes
STA 141 - Exploratory Data Analysis and Visualization
Chapter 12
Dashboards
This will introduce dashboards and emphasize
They allow the user to look at different resolutions of the data.
They allow the user to do their own exploration.
This allows for an “unguided” tour.