Chapter 5 Data Analysis
5.1 Descriptive statistics
5.1.1 Univariate analysis
Index of qualitative variation (categorical variables)
R
## Frequencies
## allbus2012$female
## Type: Factor
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ------------ ------ --------- -------------- --------- --------------
## Male 1725 49.57 49.57 49.57 49.57
## Female 1755 50.43 100.00 50.43 100.00
## <NA> 0 0.00 100.00
## Total 3480 100.00 100.00 100.00 100.00
## Frequencies
## allbus2012$migrant
## Type: Factor
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ------------- ------ --------- -------------- --------- --------------
## Native 3298 94.77 94.77 94.77 94.77
## Migrant 182 5.23 100.00 5.23 100.00
## <NA> 0 0.00 100.00
## Total 3480 100.00 100.00 100.00 100.00
Measures of central tendency (metric variables)
The most important measures of central tendency are the arithmetic mean, the median, and the mode.
R
allbus2012 %>%
select(class, imp_nei, imp_fr, imp_fam, health, finance, lifesat) %>%
descr(stats = c("min", "max", "med", "mean"), transpose = T)
## Descriptive Statistics
## allbus2012
## Label: GGSScompact 2012
## N: 3480
##
## Min Max Median Mean
## ------------- ------ ------- -------- ------
## class 1.00 5.00 3.00 2.77
## finance 1.00 5.00 4.00 3.53
## health 1.00 5.00 4.00 3.55
## imp_fam 1.00 7.00 7.00 6.50
## imp_fr 1.00 7.00 6.00 5.68
## imp_nei 1.00 7.00 5.00 4.60
## lifesat 1.00 11.00 9.00 8.64
## Frequencies
## allbus2012$lifesat
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 1 8 0.23 0.23 0.23 0.23
## 2 6 0.17 0.40 0.17 0.40
## 3 26 0.75 1.15 0.75 1.15
## 4 46 1.32 2.47 1.32 2.47
## 5 76 2.19 4.66 2.18 4.66
## 6 283 8.14 12.80 8.13 12.79
## 7 263 7.56 20.36 7.56 20.34
## 8 606 17.43 37.79 17.41 37.76
## 9 1071 30.80 68.59 30.78 68.53
## 10 653 18.78 87.37 18.76 87.30
## 11 439 12.63 100.00 12.61 99.91
## <NA> 3 0.09 100.00
## Total 3480 100.00 100.00 100.00 100.00
Measures of dispersion (metric variables)
R
Variance, standard deviation (sd), coefficient of variation (sd/mean)
allbus2012 %>%
select(class, starts_with("imp"), health, finance, lifesat) %>%
descr(stats = c("min", "max", "med", "mean", "sd", "cv"), transpose = T)
## Descriptive Statistics
## allbus2012
## Label: GGSScompact 2012
## N: 3480
##
## Min Max Median Mean Std.Dev CV
## ------------- ------ ------- -------- ------ --------- ------
## class 1.00 5.00 3.00 2.77 0.66 0.24
## finance 1.00 5.00 4.00 3.53 0.80 0.23
## health 1.00 5.00 4.00 3.55 1.00 0.28
## imp_fam 1.00 7.00 7.00 6.50 1.15 0.18
## imp_fr 1.00 7.00 6.00 5.68 1.19 0.21
## imp_nei 1.00 7.00 5.00 4.60 1.59 0.35
## lifesat 1.00 11.00 9.00 8.64 1.72 0.20
5-point statistics (see, Tuckey 1975)
allbus2012 %>%
select(class, starts_with("imp"), health, finance, lifesat) %>%
descr(stats = c("fivenum"), transpose = T)
## Descriptive Statistics
## allbus2012
## Label: GGSScompact 2012
## N: 3480
##
## Min Q1 Median Q3 Max
## ------------- ------ ------ -------- ------- -------
## class 1.00 2.00 3.00 3.00 5.00
## finance 1.00 3.00 4.00 4.00 5.00
## health 1.00 3.00 4.00 4.00 5.00
## imp_fam 1.00 7.00 7.00 7.00 7.00
## imp_fr 1.00 5.00 6.00 7.00 7.00
## imp_nei 1.00 4.00 5.00 6.00 7.00
## lifesat 1.00 8.00 9.00 10.00 11.00
Skewness and kurtosis
allbus2012 %>%
select(class, starts_with("imp"), health, finance, lifesat) %>%
descr(stats = c("min", "max", "med", "mean", "skewness", "kurtosis"),
transpose = T)
## Descriptive Statistics
## allbus2012
## Label: GGSScompact 2012
## N: 3480
##
## Min Max Median Mean Skewness Kurtosis
## ------------- ------ ------- -------- ------ ---------- ----------
## class 1.00 5.00 3.00 2.77 -0.05 0.54
## finance 1.00 5.00 4.00 3.53 -0.83 0.74
## health 1.00 5.00 4.00 3.55 -0.47 -0.20
## imp_fam 1.00 7.00 7.00 6.50 -2.89 8.67
## imp_fr 1.00 7.00 6.00 5.68 -0.84 0.48
## imp_nei 1.00 7.00 5.00 4.60 -0.35 -0.52
## lifesat 1.00 11.00 9.00 8.64 -0.98 1.33
Histogram and box-plot
5.1.2 Bivariate analysis
Categorical characteristics: Chi2 und Cramer’s V
R
## Cross-Tabulation
## female * migrant
## Data Frame: allbus2012
## Label: GGSScompact 2012
##
## -------- --------- -------- --------- -------
## migrant Native Migrant Total
## female
## Male 1643 82 1725
## Female 1655 100 1755
## Total 3298 182 3480
## -------- --------- -------- --------- -------
##
## # Measure of Association for Contingency Tables
##
## Chi-squared: 1.5654
## Cramer's V: 0.0212
## p-value: 0.2109
## Cross-Tabulation
## female * astrology
## Data Frame: allbus2012
## Label: GGSScompact 2012
##
## -------- ----------- ------ ----- ------ -------
## astrology No Yes <NA> Total
## female
## Male 1431 290 4 1725
## Female 1245 506 4 1755
## Total 2676 796 8 3480
## -------- ----------- ------ ----- ------ -------
##
## # Measure of Association for Contingency Tables
##
## Chi-squared: 71.2874
## Cramer's V: 0.1433
## p-value: <0.001
Metric and categorical
R
##
## Two Sample t-test
##
## data: allbus2012$lifesat by allbus2012$female
## t = -1.9648, df = 3475, p-value = 0.04951
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2288332507 -0.0002433446
## sample estimates:
## mean in group Male mean in group Female
## 8.583865 8.698404
Metric and metric
R
## Parameter1 | Parameter2 | r | 95% CI | t | df | p | Method | n_Obs
## ---------------------------------------------------------------------------------------
## lifesat | finance | 0.44 | [0.42, 0.47] | 29.24 | 3471 | < .001 | Pearson | 3473
## lifesat | health | 0.31 | [0.28, 0.34] | 19.49 | 3473 | < .001 | Pearson | 3475
## lifesat | class | 0.24 | [0.21, 0.27] | 14.35 | 3428 | < .001 | Pearson | 3430
## finance | health | 0.23 | [0.20, 0.26] | 14.07 | 3472 | < .001 | Pearson | 3474
## finance | class | 0.34 | [0.31, 0.37] | 21.35 | 3427 | < .001 | Pearson | 3429
## health | class | 0.19 | [0.16, 0.23] | 11.52 | 3429 | < .001 | Pearson | 3431
5.2 Inferential statistics
5.2.1 Linear regression
OLS-regression and diagnostics
R
ols <- lm(lifesat ~ finance + health + imp_fam + imp_fr + imp_nei +
class + age + age2 + female + migrant,
data = allbus2012)
tab_model(ols, show.se = TRUE, digits = 3)
lifesat | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 2.772 | 0.298 | 2.188 – 3.356 | <0.001 |
finance | 0.747 | 0.035 | 0.679 – 0.815 | <0.001 |
health | 0.398 | 0.028 | 0.344 – 0.453 | <0.001 |
imp_fam | 0.122 | 0.023 | 0.078 – 0.167 | <0.001 |
imp_fr | 0.060 | 0.023 | 0.015 – 0.105 | 0.009 |
imp_nei | 0.080 | 0.018 | 0.045 – 0.114 | <0.001 |
class | 0.201 | 0.041 | 0.121 – 0.281 | <0.001 |
age | -0.021 | 0.008 | -0.036 – -0.005 | 0.008 |
age2 | 0.000 | 0.000 | 0.000 – 0.000 | 0.001 |
female [Female] | 0.113 | 0.051 | 0.013 – 0.214 | 0.026 |
migrant [Migrant] | 0.074 | 0.115 | -0.153 – 0.300 | 0.523 |
Observations | 3407 | |||
R2 / R2 adjusted | 0.270 / 0.268 |
##
## Breusch Pagan Test for Heteroskedasticity
## -----------------------------------------
## Ho: the variance is constant
## Ha: the variance is not constant
##
## Data
## -----------------------------------
## Response : lifesat
## Variables: fitted values of lifesat
##
## Test Summary
## -------------------------------
## DF = 1
## Chi2 = 257.3509
## Prob > Chi2 = 6.485887e-58
##
## RESET test
##
## data: ols
## RESET = 0.5925, df1 = 3, df2 = 3393, p-value = 0.6199
Heteroskedasticity robust standard errors
Due to heteroskedasticity robust standard errors should be estimated
R
lifesat | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 2.772 | 0.324 | 2.137 – 3.407 | <0.001 |
finance | 0.747 | 0.041 | 0.666 – 0.828 | <0.001 |
health | 0.398 | 0.033 | 0.334 – 0.463 | <0.001 |
imp_fam | 0.122 | 0.027 | 0.069 – 0.176 | <0.001 |
imp_fr | 0.060 | 0.026 | 0.009 – 0.112 | 0.022 |
imp_nei | 0.080 | 0.020 | 0.041 – 0.118 | <0.001 |
class | 0.201 | 0.046 | 0.111 – 0.291 | <0.001 |
age | -0.021 | 0.008 | -0.036 – -0.005 | 0.008 |
age2 | 0.000 | 0.000 | 0.000 – 0.000 | 0.001 |
female [Female] | 0.113 | 0.051 | 0.013 – 0.214 | 0.026 |
migrant [Migrant] | 0.074 | 0.130 | -0.182 – 0.329 | 0.571 |
Observations | 3407 | |||
R2 / R2 adjusted | 0.270 / 0.268 |
Standardized b-coefficients
One can also request standardized b-coefficients (betas) to compare the strength of relation between coefficients.
R
lifesat | |||||||
---|---|---|---|---|---|---|---|
Predictors | Estimates | std. Error | std. Beta | standardized std. Error | CI | standardized CI | p |
(Intercept) | 2.772 | 0.298 | -0.036 | 0.021 | 2.188 – 3.356 | -0.077 – 0.006 | <0.001 |
finance | 0.747 | 0.035 | 0.348 | 0.016 | 0.679 – 0.815 | 0.316 – 0.380 | <0.001 |
health | 0.398 | 0.028 | 0.233 | 0.016 | 0.344 – 0.453 | 0.201 – 0.265 | <0.001 |
imp_fam | 0.122 | 0.023 | 0.081 | 0.015 | 0.078 – 0.167 | 0.052 – 0.111 | <0.001 |
imp_fr | 0.060 | 0.023 | 0.041 | 0.016 | 0.015 – 0.105 | 0.011 – 0.072 | 0.009 |
imp_nei | 0.080 | 0.018 | 0.074 | 0.016 | 0.045 – 0.114 | 0.042 – 0.106 | <0.001 |
class | 0.201 | 0.041 | 0.078 | 0.016 | 0.121 – 0.281 | 0.047 – 0.109 | <0.001 |
age | -0.021 | 0.008 | -0.215 | 0.081 | -0.036 – -0.005 | -0.374 – -0.056 | 0.008 |
age2 | 0.000 | 0.000 | 0.275 | 0.081 | 0.000 – 0.000 | 0.117 – 0.433 | 0.001 |
female [Female] | 0.113 | 0.051 | 0.066 | 0.030 | 0.013 – 0.214 | 0.008 – 0.125 | 0.026 |
migrant [Migrant] | 0.074 | 0.115 | 0.043 | 0.068 | -0.153 – 0.300 | -0.089 – 0.176 | 0.523 |
Observations | 3407 | ||||||
R2 / R2 adjusted | 0.270 / 0.268 |
5.2.2 Logistic regression
R
logit <- glm(astrology ~ age + migrant + female + class + finance + health,
family = "binomial"(link = "logit"),
data = allbus2012)
tab_model(logit, transform = NULL,
vcov.fun = "HC", vcov.type = "HC1",
show.se = T, digits = 3)
astrology | ||||
---|---|---|---|---|
Predictors | Log-Odds | std. Error | CI | p |
(Intercept) | 0.665 | 0.301 | 0.076 – 1.255 | 0.027 |
age | -0.038 | 0.003 | -0.044 – -0.033 | <0.001 |
migrant [Migrant] | -0.424 | 0.204 | -0.824 – -0.025 | 0.037 |
female [Female] | 0.725 | 0.088 | 0.553 – 0.897 | <0.001 |
class | 0.172 | 0.067 | 0.040 – 0.303 | 0.011 |
finance | -0.110 | 0.056 | -0.220 – 0.000 | 0.050 |
health | -0.154 | 0.048 | -0.247 – -0.060 | 0.001 |
Observations | 3413 | |||
R2 Tjur | 0.092 |