5.7 Creating new columns

As we see above, you can create columns with new variables when you use group_by() and summarize(). Frequently you’ll want to create new columns based on the values in existing columns, for example to do unit conversions, or to find the ratio of values in two columns. For this you can use the mutate() function. The transmute() function is similar, but replaces old columns with the new one.

Each row in our dataset represents a week of data. It might be better if we can look at yearly trends. So, let’s try separating out the year from PeriodStartDate into its own column. In addition to mutate(), we will also use the year() function from lubridate

#Get year in separate column from PeriodStartDate
non_cumulative_period <- 
  non_cumulative_period %>% 
  mutate(Year = year(PeriodStartDate))

Now we can do our grouping and summarizing again as we did above.

#Get totals for each state each year.
non_cumulative_year <-
  non_cumulative_period %>% group_by(State, Year) %>% summarise(TotalCount = sum(TotalCount))
## `summarise()` has grouped output by 'State'. You can override using the `.groups` argument.
non_cumulative_year
## # A tibble: 4,210 x 3
## # Groups:   State [56]
##    State    Year TotalCount
##    <chr>   <dbl>      <dbl>
##  1 ALABAMA  1909          1
##  2 ALABAMA  1910        606
##  3 ALABAMA  1911        587
##  4 ALABAMA  1912        109
##  5 ALABAMA  1913          7
##  6 ALABAMA  1914          4
##  7 ALABAMA  1915          2
##  8 ALABAMA  1916         66
##  9 ALABAMA  1917       5134
## 10 ALABAMA  1918       1651
## # … with 4,200 more rows

We could use a similar method to find the average number of cases per year.

#Get the avg number of counts across all states, each year.
avg_count_by_year <-
  non_cumulative_year %>%
  group_by(Year) %>%  
  summarise(Avg = mean(TotalCount))

avg_count_by_year
## # A tibble: 96 x 2
##     Year   Avg
##    <dbl> <dbl>
##  1  1906  112.
##  2  1907 1100.
##  3  1908 1686.
##  4  1909 1609.
##  5  1910 2564.
##  6  1911 1833.
##  7  1912 1897.
##  8  1913 3266.
##  9  1914 1550.
## 10  1915 2329.
## # … with 86 more rows