3.2 Initial Investigations
We might want to look at this data. We can do that by clicking on the name of the dataset in the environment pane, or with the following code:
#--- Look at the dataset (note the capitalisation, R is case sensitive!)
View(sdg)
#--- We can also just look at the first few rows (useful for large datasets)
head(sdg)
## country code reg gdp gini pop delta.pop int.migrant
## 1 Afghanistan AFG EMR 561.7787 NA 34656032 33.84 1.18
## 2 Angola AGO AFR 3110.8080 NA 28813463 42.20 0.43
## 3 Argentina ARG AMR 12449.2200 42.7 43847430 10.84 4.81
## 4 Armenia ARM EUR 3606.1520 32.4 2924816 -1.14 6.34
## 5 Bangladesh BGD SEA 1358.7800 NA 163000000 12.10 0.88
## 6 Belize BLZ AMR 4810.5660 NA 366954 26.21 14.99
## urb delta.urb emp.ratio slums pop.density largest.city sanitation
## 1 27.13 3.90 48.05 62.7 53.08 51.49 45.1
## 2 44.82 7.88 63.81 55.5 23.11 44.43 88.6
## 3 91.89 1.63 57.02 16.7 16.02 38.06 96.2
## 4 62.56 -1.59 52.90 14.4 102.73 56.85 96.2
## 5 35.04 7.52 59.68 55.1 1251.84 31.94 57.7
## 6 43.85 -2.20 62.22 10.8 16.09 NA 93.5
## water million tb urb.pov electric pollution urban.pov.hc primary
## 1 78.2 13.97 189 27.6 98.7 48.01676 27.6 NA
## 2 75.4 24.55 370 NA 51.0 36.39543 NA 84.01231
## 3 99.0 43.94 25 13.6 NA 13.44397 4.7 99.34679
## 4 100.0 35.57 41 NA 100.0 25.50769 30.0 96.07425
## 5 86.5 14.66 225 NA 90.7 89.39291 21.3 90.50861
## 6 98.9 NA 25 4.7 100.0 27.04049 NA 96.14116
## health.exp tb.cure case.d diarrhea.trt imm.dpt mat.mort nurse.mw beds
## 1 63.87641 87 58 40.7 65 1291.0 0.360 0.5
## 2 23.96155 34 64 NA 64 NA NA NA
## 3 30.72721 52 87 59.1 92 32.4 NA 4.7
## 4 53.51334 78 89 NA 94 19.0 4.994 3.9
## 5 66.97587 93 57 66.1 97 210.0 0.213 0.6
## 6 23.00839 35 87 42.5 95 45.0 1.959 1.1
## ari lmic
## 1 61.5 Low income
## 2 NA Lower middle income
## 3 94.3 Upper middle income
## 4 56.8 Lower middle income
## 5 42.0 Lower middle income
## 6 82.2 Upper middle income
#--- We can also just get the column names
names(sdg)
## [1] "country" "code" "reg" "gdp"
## [5] "gini" "pop" "delta.pop" "int.migrant"
## [9] "urb" "delta.urb" "emp.ratio" "slums"
## [13] "pop.density" "largest.city" "sanitation" "water"
## [17] "million" "tb" "urb.pov" "electric"
## [21] "pollution" "urban.pov.hc" "primary" "health.exp"
## [25] "tb.cure" "case.d" "diarrhea.trt" "imm.dpt"
## [29] "mat.mort" "nurse.mw" "beds" "ari"
## [33] "lmic"
Note the well-commented code telling you what each snippet does!
Since we are interested in TB, we might want to look at some TB-specific data. The column ‘tb’ contains information about the TB incidence rate expressed as the number of cases per 100,000 people.
#--- Look at the TB values for the first few observations
head(sdg$tb)
## [1] 189 370 25 41 225 25
#--- Get some summary statistics for the values of TB incidence
summary(sdg$tb)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 11.0 50.5 110.4 152.0 834.0 7
Note that here we use a dollar sign to “index” the ‘tb’ column. Typing “sdg$tb” says that we are interested in the column tb in the dataset sdg.
EXERCISE: What is the mean GDP for all nations? How many nations have missing GDP data?
We can get some more detailed summary information using the psych package.
EXERCISE: Install the psych package and load its library
#--- Get a detailed summary
describe(sdg$tb)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 210 110.39 149.66 50.5 78.43 63.9 0 834 834 2.06 4.61
## se
## X1 10.33
#--- Get a summary by group
describeBy(sdg$tb, sdg$lmic)
##
## Descriptive statistics by group
## group: High income
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 71 18.88 26.34 8.2 13.26 6.23 0 164 164 2.99 11.43
## se
## X1 3.13
## --------------------------------------------------------
## group: Low income
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 31 205.84 136.22 189 189.56 139.36 35 561 526 0.95 0.43
## se
## X1 24.47
## --------------------------------------------------------
## group: Lower middle income
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 52 201.79 174.41 141.5 180.74 137.14 1.1 788 786.9 1.12 0.81
## se
## X1 24.19
## --------------------------------------------------------
## group: Upper middle income
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 56 88.72 146.91 40.5 53.21 40.77 4.6 834 829.4 3.16 10.99
## se
## X1 19.63
EXERCISE: Get detailed summary statistics for maternal mortality.