An Introduction to Political and Social Data Analysis Using R
Preface
Origin Story
How to use this book
What’s in this Book?
Keys to Student Success
Data Sets and Codebooks
1
Introduction to Research and Data
1.1
Political and Social Data Analysis
1.2
Data Analysis or Statistics?
1.3
Research Process
1.3.1
Interests and Expectations
1.3.2
Research Preparation
1.3.3
Data Analysis and Interpretation
1.3.4
Feedback
1.4
Observational vs. Experimental Data
1.4.1
Necessary Conditions for Causality
1.5
Levels of Measurement
1.6
Level of Analysis
1.7
Next Steps
1.8
Exercises
1.8.1
Concepts and Calculations
2
Using R to Do Data Analysis
2.1
Accessing R
2.2
Understanding Where R (or any program) Fits In
2.3
Time to Use R
2.4
Some R Terminology
2.4.1
Save Your Work
2.5
Next Steps
2.6
Exercises
2.6.1
Concepts and Calculations
2.6.2
R Problems
3
Frequencies and Basic Graphs
3.1
Get Ready
3.2
Introduction
3.3
Counting Outcomes
3.3.1
The Limits of Frequency Tables
3.4
Graphing Outcomes
3.4.1
Bar Charts
3.4.2
Histograms
3.4.3
Density Plots
3.4.4
A few Add-ons for Graphing
3.5
Next Steps
3.6
Exercises
3.6.1
Concepts and Calculations
3.6.2
R Problems
4
Transforming Variables
4.1
Get Ready
4.2
Introduction
4.3
Data Transformations
4.4
Renaming and Relabeling
4.4.1
Changing Attributes
4.5
Collapsing and Reordering Catagories
4.6
Combining Variables
4.6.1
Creating an Index
4.7
Saving Your Changes
4.8
Next Steps
4.9
Exercises
4.9.1
Concepts and Calculations
4.9.2
R Problems
5
Measures of Central Tendency
5.1
Get Ready
5.2
Central Tendency
5.2.1
Mode
5.3
Median
5.4
The Mean
5.4.1
Dichotomous Variables
5.5
Mean, Median, and the Distribution of Variables
5.6
Skewness Statistic
5.7
Adding Legends to Graphs
5.8
Next Steps
5.9
Exercises
5.9.1
Concepts and Calculations
5.9.2
R Problems
6
Measures of Dispersion
6.1
Get Ready
6.2
Introduction
6.3
Measures of Spread
6.3.1
Range
6.3.2
Interquartile Range (IQR)
6.3.3
Boxplots
6.4
Dispersion Around the Mean
6.4.1
Don’t Make Bad Comparisons
6.5
Dichotomous Variables
6.6
Dispersion in Categorical Variables?
6.7
The Standard Deviation and the Normal Curve
6.7.1
Really Important Caveat
6.8
Calculating Area Under a Normal Curve
6.9
One Last Thing
6.10
Next Steps
6.11
Exercises
6.11.1
Concepts and Calculations
6.11.2
R Problems
7
Probability
7.1
Get Started
7.2
Probability
7.3
Theoretical Probabilities
7.3.1
Large and Small Sample Outcomes
7.4
Empirical Probabilities
7.4.1
Empirical Probabilities in Practice
7.4.2
Intersection of Two Probabilities
7.4.3
The Union of Two Probabilities
7.4.4
Conditional Probabilities
7.5
The Normal Curve and Probability
7.6
Next Steps
7.7
Exercises
7.7.1
Concepts and Calculations
7.7.2
R Problems
8
Sampling and Inference
8.1
Getting Ready
8.2
Statistics and Parameters
8.3
Sampling Error
8.4
Sampling Distributions
8.4.1
Simulating the Sampling Distribution
8.5
Confidence Intervals
8.6
Proportions
8.7
Next Steps
8.8
Exercises
8.8.1
Concepts and Calculations
8.8.2
R Problems
9
Hypothesis Testing
9.1
Getting Started
9.2
The Logic of Hypothesis Testing
9.2.1
Using Confidence Intervals
9.2.2
Direct Hypothesis Tests
9.2.3
One-tail or Two?
9.3
T-Distribution
9.4
Proportions
9.5
T-test in R
9.6
Next Steps
9.7
Exercises
9.7.1
Concepts and Calculations
9.7.2
R Problems
10
Hypothesis Testing with Two Groups
10.1
Getting Ready
10.2
Testing Hypotheses about Two Means
10.2.1
Generating Subgroup Means
10.3
Hypothesis Testing with Two means
10.3.1
A Theoretical Example
10.3.2
Returning to the Empirical Example
10.3.3
Calculating the t-score
10.3.4
Statistical Significance vs. Effect Size
10.4
Difference in Proportions
10.5
Plotting Mean Differences
10.6
What’s Next?
10.7
Exercises
10.7.1
Concepts and Calculations
10.7.2
R Problems
11
Hypothesis Testing with Multiple Groups
11.1
Get Ready
11.2
Internet Access as an Indicator of Development
11.2.1
The Relationship between Wealth and Internet Access
11.3
Analysis of Variance
11.3.1
Important concepts/statistics:
11.4
Anova in R
11.5
Effect Size
11.5.1
Plotting Multiple Means
11.6
Population Size and Internet Access
11.7
Connecting the T-score and F-Ratio
11.8
Next Steps
11.9
Exercises
11.9.1
Concepts and Calculations
11.9.2
R Problems
12
Hypothesis Testing with Non-Numeric Variables (Crosstabs)
12.1
Getting Ready
12.2
Crosstabs
12.2.1
The Relationship Between Education and Religiosity
12.3
Sampling Error
12.4
Hypothesis Testing with Crosstabs
12.4.1
Regional Differences in Religiosity?
12.5
Directional Patterns in Crosstabs
12.5.1
Age and Religious Importance
12.6
Limitations of Chi-Square
12.7
Next Steps
12.8
Exercises
12.8.1
Concepts and Calculations
13
Measures of Association
13.1
Getting Ready
13.2
Going Beyond Chi-squared
13.3
Measures of Association for Crosstabs
13.3.1
Cramer’s V
13.3.2
Lambda
13.4
Ordinal Measures of Association
13.4.1
Gamma
13.4.2
Tau-b and Tau-c
13.5
Revisiting the Gender Gap in Abortion Attitudes
13.5.1
When to Use Which Measure
13.6
Next Steps
13.7
Exercises
13.7.1
Concepts and Calculations
13.7.2
R Problems
14
Correlation and Scatterplots
14.1
Get Started
14.2
Relationships between Numeric Variables
14.3
Scatterplots
14.4
Pearson’s r
14.4.1
Calculating Pearson’s r
14.4.2
Other Independent Variables
14.5
Variation in Strength of Relationships
14.6
Proportional Reduction in Error
14.7
Correlation and Scatterplot Matrices
14.8
Overlapping Explanations
14.9
Next Steps
14.10
Exercises
14.10.1
Concepts and calculations
14.10.2
R Problems
15
Simple Regression
15.1
Get Started
15.2
Linear Relationships
15.3
Ordinary Least Squares Regression
15.3.1
Calculation Example: Presidential Vote in 2016 and 2020
15.4
How Well Does the Model Fit the Data?
15.5
Proportional Reduction in Error
15.6
Getting Regression Results in R
15.6.1
All Fifty States
15.7
Understanding the Constant
15.8
Non-numeric Independent Variables
15.9
Adding More Information to Scatterplots
15.10
Next Steps
15.11
Assignments
15.11.1
Concepts and Calculations
15.11.2
R Problems
16
Multiple Regression
16.1
Getting Started
16.2
Organizing the Regession Output
16.2.1
Summarizing Life Expectancy Models.
16.3
Multiple Regression
16.3.1
Assessing the Substantive Impact
16.4
Model Accuracy
16.5
Predicted Outcomes
16.5.1
Identifying Observations
16.6
Next Steps
16.7
Exercises
16.7.1
Concepts and Calculations
16.7.2
R Problems
17
Advanced Regression Topics
17.1
Get Started
17.2
Incorporating Access to Health Care
17.3
Multicollinearity
17.4
Checking on Linearity
17.4.1
Stop and Think
17.5
Which Variables have the Greatest Impact?
17.6
Statistics vs. Substance
17.7
Next Steps
17.8
Exercises
17.8.1
Concepts and Calculations
17.8.2
R Problems
18
Regession Assumptions
18.1
Get Started
18.2
Regression Assumptions
18.3
Linearity
18.4
Independent Variables are not Correlated with the Error Term
18.5
No Perfect Multicollinearity
18.6
The Mean of the Error Term equals zero
18.7
The Error Term is Normally Distributed
18.8
Constant Error Variance (Homoscedasticity)
18.9
Independent Errors
18.10
Next Steps
18.11
Exercises
18.11.1
Concepts and Calculations
18.11.2
R Problems
Appendix: Codebooks
ANES20
County20large
Countries2
States20
Published with bookdown
An Introduction to Political and Social Data Analysis Using R
An Introduction to Political and Social Data Analysis Using R
Thomas M. Holbrook
2023-08-20