3.1 Descriptive Statistics

The below code tabulates all of the frequencies and percents for each asset variable. Note that some of the asset variables have already had categories combined to ensure that low cell counts do not dramatically skew the PCA result - this kind of data tidying should be done prior to applying any kind of principal component analysis.

Create a table of these values and indicate whether or not you think they are a marker of higher or lower socioeconomic position as you enter them into this table.

#--- Create a unique ID for each individual from their row number
tz <- tz %>% rowid_to_column("ID")

#--- Select just the asset variables and a unique ID
assets <- tz %>% select(ID, water:roof)

#--- Extract the summaries
assets %>% count(water) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 6 x 3
##   water                          n     pct
##   <fct>                      <int>   <dbl>
## 1 Open Well                    712 25.8   
## 2 Piped to dwelling            278 10.1   
## 3 Piped to local source        847 30.7   
## 4 Protected well               311 11.3   
## 5 Stream, river, lake, other   613 22.2   
## 6 <NA>                           1  0.0362
assets %>% count(toilet) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   toilet        n    pct
##   <fct>     <int>  <dbl>
## 1 No toilet   731 26.5  
## 2 Toilet     2026 73.4  
## 3 <NA>          5  0.181
assets %>% count(electricity) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   electricity     n    pct
##   <fct>       <int>  <dbl>
## 1 no           2549 92.3  
## 2 yes           210  7.60 
## 3 <NA>            3  0.109
assets %>% count(radio) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   radio     n     pct
##   <fct> <int>   <dbl>
## 1 no     1020 36.9   
## 2 yes    1740 63.0   
## 3 <NA>      2  0.0724
assets %>% count(tv) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   tv        n     pct
##   <fct> <int>   <dbl>
## 1 no     2592 93.8   
## 2 yes     169  6.12  
## 3 <NA>      1  0.0362
assets %>% count(fridge) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   fridge     n     pct
##   <fct>  <int>   <dbl>
## 1 no      2668 96.6   
## 2 yes       93  3.37  
## 3 <NA>       1  0.0362
assets %>% count(bike) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   bike      n     pct
##   <fct> <int>   <dbl>
## 1 no     1316 47.6   
## 2 yes    1444 52.3   
## 3 <NA>      2  0.0724
assets %>% count(car) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 3 x 3
##   car       n     pct
##   <fct> <int>   <dbl>
## 1 no     2718 98.4   
## 2 yes      43  1.56  
## 3 <NA>      1  0.0362
assets %>% count(floor) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 4 x 3
##   floor           n    pct
##   <fct>       <int>  <dbl>
## 1 carpet         27  0.978
## 2 cement        710 25.7  
## 3 earth, sand  2019 73.1  
## 4 other           6  0.217
assets %>% count(wall) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 8 x 3
##   wall                 n    pct
##   <fct>            <int>  <dbl>
## 1 baked bricks       468 16.9  
## 2 cement blocks      295 10.7  
## 3 grass               19  0.688
## 4 poles and mud     1097 39.7  
## 5 stones             179  6.48 
## 6 sun-dried bricks   672 24.3  
## 7 wood, timber        29  1.05 
## 8 <NA>                 3  0.109
assets %>% count(roof) %>% mutate(pct = n*100 / sum(n))
## # A tibble: 4 x 3
##   roof                 n    pct
##   <fct>            <int>  <dbl>
## 1 asbestos, other     29  1.05 
## 2 grass,thatch,mud  1313 47.5  
## 3 iron sheets       1417 51.3  
## 4 <NA>                 3  0.109

We may wish to do an initial test of our assumption that the asset variables together measure some underlying construct. One way to do this is to see how the variables correlate with one another. For example, we can investigate the association between electricity access and car ownership and clearly observe that there is a strong correlation. You may wish to modify this code to investigate some other variables.

Consider carefully what dimensions of socioeconomic position these sorts of associations might represent. For example, access to electricity is an area-level variable, constrated with car ownership which is at the level of the household, or even the individual. It may be possible that even high-SEP households still cannot access electricity if they live in an area with no electricity infrastructure - individuals may be relatively wealthy, but not wealthy enough to move or to change characteristics of their surrounding neighbourhood.

#--- Is there an association between electricity access and car ownership?
assets %$% cc(car, electricity, graph = F)
## Warning in chisq.test(table1, correct = FALSE): Chi-squared approximation
## may be incorrect
## 
##        electricity
## car       no  yes Total
##   no    2533  182  2715
##   yes     15   28    43
##   Total 2548  210  2758
## 
## OR =  25.98 
## 95% CI =  13.63, 49.51  
## Chi-squared = 205.32, 1 d.f., P value = 0
## Fisher's exact test (2-sided) P value = 0