• An Introduction to Political and Social Data Analysis Using R
  • Preface
    • Origin Story
    • How to use this book
    • What’s in this Book?
    • Keys to Student Success
    • Data Sets and Codebooks
  • 1 Introduction to Research and Data
    • 1.1 Political and Social Data Analysis
    • 1.2 Data Analysis or Statistics?
    • 1.3 Research Process
      • 1.3.1 Interests and Expectations
      • 1.3.2 Research Preparation
      • 1.3.3 Data Analysis and Interpretation
      • 1.3.4 Feedback
    • 1.4 Observational vs. Experimental Data
      • 1.4.1 Necessary Conditions for Causality
    • 1.5 Levels of Measurement
    • 1.6 Level of Analysis
    • 1.7 Next Steps
    • 1.8 Assignments
      • 1.8.1 Concepts and Calculations
  • 2 Using R to Do Data Analysis
    • 2.1 Accessing R
    • 2.2 Understanding Where R (or any program) Fits In
    • 2.3 Time to Use R
    • 2.4 Some R Terminology
      • 2.4.1 Save Your Work
    • 2.5 Next Steps
    • 2.6 Exercises
      • 2.6.1 R Problems
  • 3 Frequencies and Basic Graphs
    • 3.1 Get Ready
    • 3.2 Introduction
    • 3.3 Counting Outcomes
      • 3.3.1 The Limits of Frequency Tables
    • 3.4 Graphing Outcomes
      • 3.4.1 Bar Charts
      • 3.4.2 Histograms
      • 3.4.3 Density Plots
      • 3.4.4 A few Add-ons for Graphing
    • 3.5 Next Steps
    • 3.6 Exercises
      • 3.6.1 Concepts and Calculations
      • 3.6.2 R Problems
  • 4 Transforming Variables
    • 4.1 Get Ready
    • 4.2 Introduction
    • 4.3 Data Transformations
    • 4.4 Renaming and Relabeling
      • 4.4.1 Changing Attributes
    • 4.5 Collapsing and Reordering Catagories
    • 4.6 Combining Variables
      • 4.6.1 Creating an Index
    • 4.7 Saving Your Changes
    • 4.8 Next Steps
    • 4.9 Exercises
      • 4.9.1 R Problems
  • 5 Measures of Central Tendency
    • 5.1 Get Ready
    • 5.2 Central Tendency
      • 5.2.1 Mode
    • 5.3 Median
    • 5.4 The Mean
      • 5.4.1 Dichotomous Variables
    • 5.5 Mean, Median, and the Distribution of Variables
    • 5.6 Skewness Statistic
    • 5.7 Adding Legends to Graphs
    • 5.8 Next Steps
    • 5.9 Assignments
      • 5.9.1 Concepts and Calculations
      • 5.9.2 R Problems
  • 6 Measures of Dispersion
    • 6.1 Get Ready
    • 6.2 Introduction
    • 6.3 Measures of Spread
      • 6.3.1 Range
      • 6.3.2 Interquartile Range (IQR)
      • 6.3.3 Boxplots
    • 6.4 Dispersion Around the Mean
      • 6.4.1 Don’t Make Bad Comparisons
    • 6.5 Dichotomous Variables
    • 6.6 Dispersion in Categorical Variables?
    • 6.7 The Standard Deviation and the Normal Curve
      • 6.7.1 Really Important Caveat
    • 6.8 Calculating Area Under a Normal Curve
    • 6.9 One Last Thing
    • 6.10 Next Steps
    • 6.11 Assignments
      • 6.11.1 Concepts and Calculations
      • 6.11.2 R Problems
  • 7 Probability
    • 7.1 Get Started
    • 7.2 Probability
    • 7.3 Theoretical Probabilities
      • 7.3.1 Large and Small Sample Outcomes
    • 7.4 Empirical Probabilities
      • 7.4.1 Empirical Probabilities in Practice
      • 7.4.2 Intersection of Two Probabilities
      • 7.4.3 The Union of Two Probabilities
      • 7.4.4 Conditional Probabilities
    • 7.5 The Normal Curve and Probability
    • 7.6 Next Steps
    • 7.7 Exercises
      • 7.7.1 Concepts and Calculations
  • 8 Sampling and Inference
    • 8.1 Getting Ready
    • 8.2 Statistics and Parameters
    • 8.3 Sampling Error
    • 8.4 Sampling Distributions
      • 8.4.1 Simulating the Sampling Distribution
    • 8.5 Confidence Intervals
    • 8.6 Proportions
    • 8.7 Next Steps
    • 8.8 Exercises
      • 8.8.1 Concepts and Calculations
      • 8.8.2 R Problems
  • 9 Hypothesis Testing
    • 9.1 Getting Started
    • 9.2 The Logic of Hypothesis Testing
      • 9.2.1 Using Confidence Intervals
      • 9.2.2 Direct Hypothesis Tests
      • 9.2.3 One-tail or Two?
    • 9.3 T-Distribution
    • 9.4 Proportions
    • 9.5 T-test in R
    • 9.6 Next Steps
    • 9.7 Exercises
      • 9.7.1 Concepts and Calculations
      • 9.7.2 R Problems
  • 10 Hypothesis Testing with Two Groups
    • 10.1 Getting Ready
    • 10.2 Testing Hypotheses about Two Means
      • 10.2.1 Generating Subgroup Means
    • 10.3 Hypothesis Testing with Two means
      • 10.3.1 A Theoretical Example
      • 10.3.2 Returning to the Empirical Example
      • 10.3.3 Calculating the t-score
      • 10.3.4 Statistical Significance vs. Effect Size
    • 10.4 Difference in Proportions
    • 10.5 Plotting Mean Differences
    • 10.6 What’s Next?
    • 10.7 Exercises
      • 10.7.1 Concepts and Calculations
      • 10.7.2 R Problems
  • 11 Hypothesis Testing with Multiple Groups
    • 11.1 Getting Ready
    • 11.2 Internet Access as an Indicator of Development
      • 11.2.1 The Relationship between Wealth and Internet Access
    • 11.3 Analysis of Variance
      • 11.3.1 Important concepts/statistics:
    • 11.4 Anova in R
    • 11.5 Effect Size
      • 11.5.1 Plotting Multiple Means
    • 11.6 Population Size and Internet Access
    • 11.7 Connecting the T-score and F-Ratio
    • 11.8 Next Steps
    • 11.9 Assignments
      • 11.9.1 Concepts and Calculations
      • 11.9.2 R Problems
  • 12 Hypothesis Testing with Crosstabs
    • 12.1 Getting Ready
    • 12.2 Crosstabs
      • 12.2.1 The Relationship Between Education and Religiosity
    • 12.3 Sampling Error
    • 12.4 Hypothesis Testing with Crosstabs
      • 12.4.1 Regional Differences in Religiosity?
    • 12.5 Directional Patterns in Crosstabs
      • 12.5.1 Age and Religious Importance
    • 12.6 Limitations of Chi-Square
    • 12.7 Next Steps
    • 12.8 Exercises
      • 12.8.1 Concepts and Calculations
  • 13 Measures of Association
    • 13.1 Getting Ready
    • 13.2 Going Beyond Chi-squared
    • 13.3 Measures of Association for Crosstabs
      • 13.3.1 Cramer’s V
      • 13.3.2 Lambda
    • 13.4 Ordinal Measures of Association
      • 13.4.1 Gamma
      • 13.4.2 Tau-b and Tau-c
    • 13.5 Revisiting the Gender Gap in Abortion Attitudes
      • 13.5.1 When to Use Which Measure
    • 13.6 Next Steps
    • 13.7 Exercises
      • 13.7.1 Concepts and Calculations
      • 13.7.2 R Problems
  • 14 Correlation and Scatterplots
    • 14.1 Get Started
    • 14.2 Relationships between Numeric Variables
    • 14.3 Scatterplots
    • 14.4 Pearson’s r
      • 14.4.1 Calculating Pearson’s r
      • 14.4.2 Other Independent Variables
    • 14.5 Variation in Strength of Relationships
    • 14.6 Proportional Reduction in Error
    • 14.7 Correlation and Scatterplot Matrices
    • 14.8 Overlapping Explanations
    • 14.9 Next Steps
    • 14.10 Exercises
      • 14.10.1 Concepts and calculations
      • 14.10.2 R Problems
  • 15 Simple Regression
    • 15.1 Get Started
    • 15.2 Linear Relationships
    • 15.3 Ordinary Least Squares Regression
      • 15.3.1 Calculation Example: Presidential Vote in 2016 and 2020
    • 15.4 How Well Does the Model Fit the Data?
    • 15.5 Proportional Reduction in Error
    • 15.6 Getting Regression Results in R
      • 15.6.1 All Fifty States
    • 15.7 Understanding the Constant
    • 15.8 Non-numeric Independent Variables
    • 15.9 Adding More Information to Scatterplots
    • 15.10 Next Steps
    • 15.11 Assignments
      • 15.11.1 Concepts and Calculations
      • 15.11.2 R Problems
  • 16 Multiple Regression
    • 16.1 Getting Started
    • 16.2 Organizing the Regession Output
      • 16.2.1 Summarizing Life Expectancy Models.
    • 16.3 Multiple Regression
      • 16.3.1 Assessing the Substantive Impact
    • 16.4 Model Accuracy
    • 16.5 Predicted Outcomes
      • 16.5.1 Identifying Observations
    • 16.6 Next Steps
    • 16.7 Exercises
      • 16.7.1 Concepts and Calculations
      • 16.7.2 R Problems
  • 17 Advanced Regression Topics
    • 17.1 Get Started
    • 17.2 Incorporating Access to Health Care
    • 17.3 Multicollinearity
    • 17.4 Checking on Linearity
      • 17.4.1 Stop and Think
    • 17.5 Which Variables have the Greatest Impact?
    • 17.6 Statistics vs. Substance
    • 17.7 Next Steps
    • 17.8 Assignments
      • 17.8.1 R Problems
  • 18 Regession Assumptions
    • 18.1 Get Started
    • 18.2 Regression Assumptions
    • 18.3 Linearity
    • 18.4 Independent Variables are not Correlated with the Error Term
    • 18.5 No Perfect Multicollinearity
    • 18.6 The Mean of the Error Term equals zero
    • 18.7 The Error Term is Normally Distributed
    • 18.8 Constant Error Variance (Homoscedasticity)
    • 18.9 Independent Errors
    • 18.10 What’s next?
    • 18.11 Assignments
      • 18.11.1 R Problems
  • Appendix: Codebooks
    • ANES20
    • County20large
    • Countries2
    • States20
  • Published with bookdown

An Introduction to Political and Social Data Analysis Using R

An Introduction to Political and Social Data Analysis Using R

Thomas M. Holbrook

2022-11-28