3.1 Descriptive Statistics
The below code tabulates all of the frequencies and percents for each asset variable. Note that some of the asset variables have already had categories combined to ensure that low cell counts do not dramatically skew the PCA result - this kind of data tidying should be done prior to applying any kind of principal component analysis.
Create a table of these values and indicate whether or not you think they are a marker of higher or lower socioeconomic position as you enter them into this table.
#--- Create a unique ID for each individual from their row number
tz <- tz %>% rowid_to_column("ID")
#--- Select just the asset variables and a unique ID
assets <- tz %>% select(ID, water:roof)
#--- Extract the summaries
assets %>% count(water) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 6 x 3
## water n pct
## <fct> <int> <dbl>
## 1 Open Well 712 25.8
## 2 Piped to dwelling 278 10.1
## 3 Piped to local source 847 30.7
## 4 Protected well 311 11.3
## 5 Stream, river, lake, other 613 22.2
## 6 <NA> 1 0.0362
assets %>% count(toilet) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## toilet n pct
## <fct> <int> <dbl>
## 1 No toilet 731 26.5
## 2 Toilet 2026 73.4
## 3 <NA> 5 0.181
assets %>% count(electricity) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## electricity n pct
## <fct> <int> <dbl>
## 1 no 2549 92.3
## 2 yes 210 7.60
## 3 <NA> 3 0.109
assets %>% count(radio) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## radio n pct
## <fct> <int> <dbl>
## 1 no 1020 36.9
## 2 yes 1740 63.0
## 3 <NA> 2 0.0724
assets %>% count(tv) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## tv n pct
## <fct> <int> <dbl>
## 1 no 2592 93.8
## 2 yes 169 6.12
## 3 <NA> 1 0.0362
assets %>% count(fridge) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## fridge n pct
## <fct> <int> <dbl>
## 1 no 2668 96.6
## 2 yes 93 3.37
## 3 <NA> 1 0.0362
assets %>% count(bike) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## bike n pct
## <fct> <int> <dbl>
## 1 no 1316 47.6
## 2 yes 1444 52.3
## 3 <NA> 2 0.0724
assets %>% count(car) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
## car n pct
## <fct> <int> <dbl>
## 1 no 2718 98.4
## 2 yes 43 1.56
## 3 <NA> 1 0.0362
assets %>% count(floor) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 4 x 3
## floor n pct
## <fct> <int> <dbl>
## 1 carpet 27 0.978
## 2 cement 710 25.7
## 3 earth, sand 2019 73.1
## 4 other 6 0.217
assets %>% count(wall) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 8 x 3
## wall n pct
## <fct> <int> <dbl>
## 1 baked bricks 468 16.9
## 2 cement blocks 295 10.7
## 3 grass 19 0.688
## 4 poles and mud 1097 39.7
## 5 stones 179 6.48
## 6 sun-dried bricks 672 24.3
## 7 wood, timber 29 1.05
## 8 <NA> 3 0.109
assets %>% count(roof) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 4 x 3
## roof n pct
## <fct> <int> <dbl>
## 1 asbestos, other 29 1.05
## 2 grass,thatch,mud 1313 47.5
## 3 iron sheets 1417 51.3
## 4 <NA> 3 0.109
We may wish to do an initial test of our assumption that the asset variables together measure some underlying construct. One way to do this is to see how the variables correlate with one another. For example, we can investigate the association between electricity access and car ownership and clearly observe that there is a strong correlation. You may wish to modify this code to investigate some other variables.
Consider carefully what dimensions of socioeconomic position these sorts of associations might represent. For example, access to electricity is an area-level variable, constrated with car ownership which is at the level of the household, or even the individual. It may be possible that even high-SEP households still cannot access electricity if they live in an area with no electricity infrastructure - individuals may be relatively wealthy, but not wealthy enough to move or to change characteristics of their surrounding neighbourhood.
#--- Is there an association between electricity access and car ownership?
assets %$% cc(car, electricity, graph = F)
## Warning in chisq.test(table1, correct = FALSE): Chi-squared approximation
## may be incorrect
##
## electricity
## car no yes Total
## no 2533 182 2715
## yes 15 28 43
## Total 2548 210 2758
##
## OR = 25.98
## 95% CI = 13.63, 49.51
## Chi-squared = 205.32, 1 d.f., P value = 0
## Fisher's exact test (2-sided) P value = 0