An Introduction to Political and Social Data Analysis Using R
Preface
Origin Story
What this Book is (and isn’t) About
How to use this book
Keys to Student Success
Chapter Contents
Supplemental Resources
Data Sets and Codebooks
Advanced R Code
Tutorial for Quarto Documents
Acknowledgements
1
Introduction to Research and Data
Political and Social Data Analysis
Data Analysis or Statistics?
Uses of Data Analysis
The Research Process
Interests and Expectations
Research Preparation
Data Analysis and Interpretation
Feedback
Other Data Related Issues
Levels of Measurement
Level of Analysis
Observational vs. Experimental Data
Causal Language
Next Steps
Exercises
Concepts and Calculations
2
Using R to Do Data Analysis
Accessing R
RStudio and Posit.cloud
Downloading R and RStudio
Opening RStudio
Understanding Where R (or any program) Fits In
Time to Use R
Downloading and Importing Data
Examine the Data Set
Get a Graph
Some R Terminology
Data Frames
Objects and Functions
Packages and Libraries
Managing Files and Output
Working Directory
2.0.1
Save Your Work
Creating Documents
Next Steps
Exercises
Concepts and Calculations
2.0.2
R Problems
3
Frequencies and Basic Graphs
Get Ready
Introduction
Frequencies
The Limits of Frequency Tables
Graphing Outcomes
Bar Charts
Histograms
Density Plots
A few Add-ons for Graphing
Next Steps
Exercises
Concepts and Calculations
R Problems
4
Data Preparation
Get Ready
Introduction
Data Transformations
Changing Variable Names
Changing Attributes
Collapsing and Reordering Categories
Collapsing Categories
4.0.1
Reordering Categories
Combining Variables
Collapsing Numeric Variables
Creating an Index
Save Your Changes
Next Steps
Exercises
Concepts and Calculations
R Problems
5
Measures of Central Tendency
Get Ready
Central Tendency
Mode
Median
The Mean
Dichotomous Variables
Mean, Median, and the Distribution of Variables
Skewness Statistic
Log Transformations
Adding Legends to Graphs
Next Steps
Exercises
Concepts and Calculations
R Problems
6
Measures of Dispersion
Get Ready
Introduction
Measures of Spread
Range
Interquartile Range (IQR)
Boxplots
Dispersion Around the Mean
6.0.1
Average deviation from the mean
Mean absolute deviation
Variance
Standard Deviation
Don’t Make Bad Comparisons
Dichotomous Variables
Dispersion in Categorical Variables?
The Standard Deviation and the Normal Curve
Really Important Caveat
Calculating Area Under a Normal Curve
One Last Thing
Next Steps
Exercises
Concepts and Calculations
R Problems
7
Probability
Get Started
Probability
Theoretical Probabilities
Large and Small Sample Outcomes
Empirical Probabilities
Empirical Probabilities in Practice
Intersection of Two Probabilities
The Union of Two Probabilities
Conditional Probabilities
The Normal Curve and Probability
Next Steps
Exercises
Concepts and Calculations
R Problems
8
Sampling and Inference
Getting Ready
Statistics and Parameters
Sampling Error
Sampling Distributions
Simulating the Sampling Distribution
Confidence Intervals
Sample Size and Confidence Limits
Proportions
Next Steps
Exercises
Concepts and Calculations
R Problems
9
Hypothesis Testing
Getting Started
The Logic of Hypothesis Testing
Using Confidence Intervals
Direct Hypothesis Tests
Sampling Distributions and Hypothesis Testing
One-tail or Two?
T-Distribution
Proportions
Types of Error
T-test in R
Next Steps
Exercises
Concepts and Calculations
R Problems
10
Hypothesis Testing with Two Groups
Getting Ready
Testing Hypotheses about Two Means
Generating Subgroup Means
Hypothesis Testing with Two means
A Theoretical Example
Returning to the Empirical Example
Calculating the t-score
T-test in R
Statistical Significance vs. Effect Size
Difference in Proportions
Plotting Mean Differences
Boxplots with Means
Bar Charts
Means Plot
What’s Next?
Exercises
Concepts and Calculations
R Problems
11
Hypothesis Testing with Multiple Groups (ANOVA)
Get Ready
Internet Access as an Indicator of Development
The Relationship between Wealth and Internet Access
Comparing two Means
Comparing Multiple Means
Analysis of Variance
Important concepts/statistics
Anova in R
A Closer Look at Group Means
Effect Size
Plotting Multiple Means
Population Size and Internet Access
Connecting the T-score and F-Ratio
Next Steps
Exercises
Concepts and Calculations
R Problems
12
Hypothesis Testing with Non-Numeric Variables (Crosstabs)
Getting Ready
Crosstabs
Regional Differences in Religiosity
Mosaic Plots
Sampling Error
Hypothesis Testing with Crosstabs (Chi-square)
Education and Religiosity
Directional Patterns in Crosstabs
Age and Religious Importance
Limitations of Chi-Square
Next Steps
Exercises
Concepts and Calculations
R Problems
13
Measures of Association
Getting Ready
Going Beyond Chi-squared
Measures of Association for Crosstabs
Cramer’s V
Lambda
Ordinal Measures of Association
Gamma
Tau-b and Tau-c
Revisiting the Gender Gap in Abortion Attitudes
When to Use Which Measure
Next Steps
Exercises
Concepts and Calculations
R Problems
14
Correlation and Scatterplots
Get Started
Relationships between Numeric Variables
Scatterplots
Pearson’s r
Calculating Pearson’s r
Other Independent Variables
Variation in Strength of Relationships
Proportional Reduction in Error
Correlation and Scatterplot Matrices
Overlapping Explanations
Next Steps
Exercises
Concepts and calculations
R Problems
15
Simple Regression
Get Started
Linear Relationships
Ordinary Least Squares Regression
Hypothesis Testing and Regression Analysis
Calculation Example: Presidential Vote in 2016 and 2020
How Well Does the Model Fit the Data?
Proportional Reduction in Error
Getting Regression Results in R
All Fifty States
Understanding the Constant
Organizing the Regression Output
Revisiting Life Expectancy
Important Caveat
Adding Regression Information to Scatterplots
Next Steps
Assignments
Concepts and Calculations
R Problems
16
Multiple Regression
Getting Started
Multiple Regression
Assessing the Substantive Impact
Model Accuracy
Predicted Outcomes
Revisiting Presidential Votes in the states
Interpreting Dichotomous Independent Variables
Interpreting the Vote Share Model
Next Steps
Exercises
Concepts and Calculations
R Problems
17
Advanced Regression Topics
Get Started
Incorporating Access to Health Care
Multicollinearity
Checking on Linearity
Stop and Think about Results
Which Variables have the Greatest Impact?
Statistics vs. Substance
Next Steps
Exercises
Concepts and Calculations
R Problems
18
Regression Assumptions
Get Started
Regression Assumptions
Linearity
Independent Variables are not Correlated with the Error Term
No Perfect Multicollinearity
The Mean of the Error Term equals zero
The Error Term is Normally Distributed
Constant Error Variance (Homoscedasticity)
Independent Errors
Next Steps
Exercises
Concepts and Calculations
R Problems
Appendix: Codebooks
ANES20
County20large
Countries2
States20
Appendix: Quarto Tutorial
Quarto Environment
Running R code
Producing the Document
Appendix: Hidden R Code
Published with bookdown
An Introduction to Political and Social Data Analysis Using R
An Introduction to Political and Social Data Analysis Using R
Thomas M. Holbrook
2024-02-20