1
Introduction
1.1
Case Studies
1.2
Reproducibility, Scalability, and Writing Code
1.3
Course Structure
1.4
Computer and Software Requirements
1.5
Next Steps
2
Prerequisites
2.1
Online Textbooks
2.2
R and RStudio
2.2.1
Installing R
2.2.2
Installing RStudio
2.2.3
Creating an R Studio Shortcut
2.2.4
Setting RStudio Defaults
2.2.5
RStudio Panels
2.3
Directories (Folders)
2.3.1
The Home Directory
2.4
Creating a Course Folder and Subfolders
2.4.1
Creating New Directories
2.5
Paths
2.5.1
Special Symbols
2.6
The Working Directory
2.7
Packages
2.7.1
Installing Packages
2.7.2
Compilation
2.8
Browser Settings
2.8.1
Chrome
2.8.2
Safari
2.8.3
Firefox
2.9
R Markdown Documents
2.9.1
Learning R Markdown
2.10
Knitting a Document
2.11
Uploading Documents to Canvas
3
Madison Lakes
3.1
Lake Mendota Freezing and Thawing
3.1.1
Criteria for freezing/thawing
3.1.2
Map
3.1.3
Winter of 2020-2021
3.1.4
Winter of 2023-2024
3.2
Lake Mendota Questions
3.3
Lake Mendota Data
3.4
Analysis
3.4.1
Wrangling the Lake Mendota Data
3.4.2
Lake Mendota Variables
3.4.3
Visualizing Ice Cover Duration versus Time
3.4.4
Modeling Ice Cover versus Time
3.4.5
Model Evaluation
3.4.6
Interpretation
3.5
Data analysis workflow
3.6
Next Steps
4
R Fundamentals
4.1
History of the R Language
4.2
Packages
4.3
Vectors
4.4
Assignment
4.5
Arithmetic with Vectors
4.6
Numerical Summaries of Vectors
4.7
Data Frames
4.7.1
Extracting Parts of Data Frames
4.8
Data Types
4.8.1
Conversions between types
4.9
Valid Object Names
4.10
Functions
4.10.1
Arguments
4.10.2
Accessing the Documentation
5
Visualization with ggplot2
5.1
Visualization
5.2
Overview
5.3
Preliminaries
5.4
Read the data
5.4.1
Checking the Data
5.5
Exploratory Data Analysis
5.6
ggplot2
5.7
Plots
5.7.1
One-variable Plots
5.8
Lake Mendota Graphs – days frozen vs. time
5.8.1
Question 1
5.8.2
Two-variable Plots
5.8.3
Observations
5.8.4
Question 2
5.8.5
Augmenting a plot with lines
5.8.6
Axis Labels and Plot Titles
5.8.7
Question 3
5.8.8
Bar Graphs
5.8.9
Facets
5.8.10
Scales
5.8.11
Adding Color
5.8.12
Guides
5.8.13
Themes
5.9
Summary
5.9.1
Basics
5.9.2
Aesthetics
5.9.3
Geoms
5.9.4
Facets
5.9.5
Color
5.9.6
Scales
5.9.7
Guides
5.9.8
Themes
5.9.9
ggplot2 Reference
6
Statistical Summaries
7
Madison Weather
7.1
Weather and Climate Change
7.2
Data
7.3
Weather Stations
7.4
Variables
7.5
Initial Data Transformations
7.6
Questions
7.7
Obtaining Data
8
Exoplanets
8.1
The Night Sky
8.2
Exoplanet Discovery
8.3
Exoplanet Data
8.3.1
Methods of Discovery
8.3.2
Earth and Jupiter
8.3.3
Mass and Radius
8.3.4
Spectral Types
8.4
Exoplanet Questions
8.5
The Journey Forward
9
Data Transformation with dplyr
10
Airport Waiting Times
10.1
Customs at US Airports
10.2
My Travel Experience
10.3
Airport Wait Times
10.4
Questions
11
Data Import with readr
12
Dates with lubridate
13
Wisconsin Obesity
13.1
Obesity
13.1.1
Obesity Definitions
13.1.2
Obesity Data
13.1.3
Obesity Variables in Excel
13.2
Census Data
13.3
Files
13.4
General Questions
14
Reshaping Data with tidyr
15
Iteration with purrr
16
Strings with stringr
17
Chimpanzees and Prosocial Choice
17.1
Prosocial Choice Experiments
17.2
Experiment Description
17.3
Controls
17.4
Behavior
17.5
Data
17.6
Statistical Models
17.7
Assumptions
17.8
Probability Preview
17.9
Questions
18
Probability
18.1
What is Probability?
18.2
Notions of Probability
18.3
Probability Definitions and Examples
18.4
Outcome Space
18.5
Probability
18.6
Law of Large Numbers
18.7
Events
18.8
Disjoint Events
18.9
Probability Axioms
18.10
Random Variables
18.11
Addition Rule
18.12
General Addition Rule
18.13
Probability Distribution of Discrete Random Variables
18.14
Probability Distribution of Continuous Random Variables
18.15
Complement Rule
18.16
Independence
18.17
Multiplication Rule
18.18
Conditional Probability
18.19
General Multiplication Rule
18.20
The Law of Total Probability
18.21
Weighted Means
18.22
Expectation
18.23
Continuous Random Variables
18.24
Variance
18.25
Sums of Random Variables
18.26
Covariance
18.27
Correlation Coefficient
18.28
Linear Combinations
19
Binomial Distributions
19.1
The Binomial Probability Mass Function
19.2
Mean and Variance
19.3
Binomial Calculations Using R
19.4
Binomial Random Samples
19.5
Binomial Probabilities in R
19.6
Binomial Quantiles
19.7
Graphing Binomial Distributions
20
Normal Distributions
20.1
Parameters
20.2
Normal Probability Density
20.3
Standard Normal Density
20.4
Benchmark Normal Probabilities
20.5
Normal CDF
20.6
Central Limit Theorem
20.6.1
Notes
20.7
Normal Calculations using R
20.8
Graphing Normal Distributions
21
Estimation
21.1
This is a stub
22
Hypothesis Testing
22.1
Hypothesis Testing Logic
22.2
Statistical Significance
22.3
Connections to Confidence Intervals
22.4
Other Hypothesis Tests
22.4.1
Comparsions Between with and without a Partner
22.4.2
Simulation Approach
22.4.3
Z-Test
22.4.4
Likelihood Ratio Test
22.4.5
Interpretation
22.5
Comparing Multiple Probabilities
22.6
Likelihood Ratio Test
22.6.1
Calculation of the LRT Statistic
22.6.2
Chi-square approach to the p-value
23
Human Sex Ratio Modeling
23.1
Is it a boy or a girl?
23.2
A Note on Gender Identity
23.3
Sex Ratio Data
23.3.1
Geissler Data, Saxony
23.3.2
Malinvaud Data, France
23.3.3
Danish Data
23.3.4
World Sex Ratios
23.4
Sex Ratio Models
23.4.1
Data Summary
23.5
Simple Binomial Model
23.5.1
Numerical Optimization
23.5.2
Goodness of Fit
23.6
Beta Binomial Model
23.6.1
Parameter Estimation
23.6.2
Goodness of Fit
23.6.3
Likelihood Ratio Test
23.7
Further Analysis
23.8
Questions
24
Volleyball
24.1
UW Women’s Volleyball
24.2
Volleyball Basics
24.2.1
Volleyball Competition
24.3
2019 Season Data
24.3.1
Volleyball Team Season Statistics
24.3.2
2019 Division I Match Statistics
24.3.3
Volleyball Data Source
24.4
Volleyball Questions
25
Correlation and Regression
25.1
Correlation
25.1.1
Correlation Formula
25.1.2
Correlation Examples
25.2
Simple Linear Regression
25.2.1
Regression Model
25.2.2
Regression Estimates
25.2.3
Understanding Regression Parameters
25.2.4
Prediction Interpretation
25.3
Prediction of Match Outcomes
25.3.1
Match Data
25.3.2
Model
25.3.3
Likelihood
25.3.4
Simulation
25.4
Fitting the Model
26
Simulation and Prediction
26.1
Predicting Outcomes of Sporting Events
26.2
Model Recap
26.3
Estimation of Volleyball Model
26.3.1
Maximum Likelihood Estimate of
\(\theta\)
26.3.2
Exploration
26.4
Predicting the Tournament
26.4.1
The simulations
27
Functions in R
Statistics 240 Course Notes
15
Iteration with purrr