4.4 Filter

Let’s now examine filter(). Filter is select but for rows. What if we wanted to look at only some subsection of the data? The below code looks at some use cases for filter().

#--- Filter the dataset to show only those countries with population below 1000000
sdg %>% select(country, pop) %>% filter(pop < 1000000) %>% head()
##      country    pop
## 1     Belize 366954
## 2    Comoros 795601
## 3   Djibouti 942333
## 4     Guyana 773303
## 5 Cabo Verde 539560
## 6 Seychelles  94677
#--- Filter to only African nations
sdg %>% select(country, reg) %>% filter(reg == "AFR") %>% head()
##                    country reg
## 1                   Angola AFR
## 2                    Benin AFR
## 3             Burkina Faso AFR
## 4                  Burundi AFR
## 5                 Cameroon AFR
## 6 Central African Republic AFR
#--- Filter to African nations with population below 1000000
sdg %>% select(country, pop, reg) %>% filter(reg == "AFR" & pop < 1000000) %>% head()
##                 country    pop reg
## 1               Comoros 795601 AFR
## 2            Cabo Verde 539560 AFR
## 3            Seychelles  94677 AFR
## 4 Sao Tome and Principe 199910 AFR
#--- Filter to African nations OR nations with population below 1000000
sdg %>% select(country, pop, reg) %>% filter(reg == "AFR" | pop < 1000000) %>% head()
##        country      pop reg
## 1       Angola 28813463 AFR
## 2       Belize   366954 AMR
## 3        Benin 10872298 AFR
## 4 Burkina Faso 18646433 AFR
## 5      Burundi 10524117 AFR
## 6     Cameroon 23439189 AFR
#--- Filter to only nations with no missing data on slum prevalence
sdg %>% select(country, slums) %>% filter(!is.na(slums)) %>% head()
##       country slums
## 1 Afghanistan  62.7
## 2      Angola  55.5
## 3   Argentina  16.7
## 4     Armenia  14.4
## 5  Bangladesh  55.1
## 6      Belize  10.8

A few notes on filtering. You’ll notice that for setting the filtering condition, we use two equals signs (==). This is because R will interpret a single equals sign as equivalent to the ‘assignment operator’ <-, which we definitely don’t want! Note also that we had to enclose AFR in quotation marks. This is because AFR is stored as a character “string” - in other words, R views AFR as letters, not numbers, and not the name of the variable (which we can access without quotations).

You’ll note that & means ‘and’, | means ‘or’, and ! means ‘not’. Through combinations of these so-called logical operators, you can filter data in a number of ways.

What if I want to save the output of a call to filter() as a new data frame for analysis? Simple. I just assign the entire chain of code to a new object.

In sentences, the below code says ‘I am going to filter the data frame ’sdg’ to only African countries, and then I will call the new data frame that contains only those countries ‘afr’’

#--- Filter to only African nations
afr <- sdg %>% filter(reg == "AFR")

EXERCISE: How many nations are there represented in the Eastern Mediterranean Region?