Chapter 6 Grammar for visualization

Let’s start with some practical plotting

6.1 Excercise 1 Fill in the blank spaces

First read in data and then fill in the blank spaces.

Like this:

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   source_date = col_datetime(format = ""),
##   ar_key = col_character(),
##   cust_id = col_character(),
##   pc_l3_pd_spec_nm = col_character(),
##   cpe_type = col_character(),
##   cpe_net_type_cmpt = col_character(),
##   pc_priceplan_nm = col_character(),
##   sc_l5_sales_cnl = col_character(),
##   rt_fst_cstatus_act_dt = col_datetime(format = ""),
##   rrpu_amt_used = col_character(),
##   rcm1pu_amt_used = col_character()
## )
## See spec(...) for full column specifications.

6.1.1 Excercise 2

  • Map one of the tr_tot_data... columns to x
## Warning: Removed 4 rows containing non-finite values (stat_density).

6.2 Let’s look at this visualization, what’s in it?

  1. Data

  2. Aestetics

  3. Geometrical objects

  4. Scales

  5. Statistics

  6. Facets

##Data

Firstly, we have data

## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

Note that we got tidy data, each observations is a row and every column is a variable.

  • Data is the first argument in ggplot()

When we filter data

6.3 aestsetics

  • In order for the visualization to work we need to specify the mapping in the data.
  • We map aestetics such as x and y or color and size

6.5 Scale

Per default ggplot() uses the following functions scale_y_continuous() and scale_x_continuous() when x and y are numerical.

  • Scale-functions have arguments
  • Sometimes we want to show the y axis from 0 to max

6.8 Coordinate system

The last component in the plot is coordinate system - Usually cartesian

  • Most useful when we want to flip the plot

6.9 Let’s get practical

6.9.1 Excercise 9

Visualize the relationship between the last two months of data volume.

  • Which are you aestetics?

  • Which geom do you use?

  • Is the scale suitable?

  • Can you use any statistical computation to visualize the relationship?

## Warning: Removed 4 rows containing missing values (geom_point).

6.9.2 Excercise 10

Add aestetics

  • Such as fill, color or size
## Warning: Removed 4 rows containing missing values (geom_point).

6.10 Geoms to visualize one variable

  • geom_bar() and geom_col() for bar charts

  • geom_histogram()

  • geom_density() for distributions

  • geom_boxplot()

  • geom_violin() for so called violin plots

## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

6.11 Groups

map groups to x to have multiple groups

6.12 Using dplyr with ggplot2 part 1

  • You can pipe dplyr with ggplot with %>%

6.12.1 Excercise 11

Vizualise calculated revenue per month (alloc_rrpu_amt)

  • Use %>%

  • Which geom best represents the data?

  • Is the distribution normal?

  • Are there any outliers? How do you filter these out?

  • Is the scale suitable?

6.13 Use dplyr with ggplot2 part 2

We can use dplyr to calculate summary statistics and then visualize it with ggplot.

What’s the difference between geom_bar() and geom_col?

  • geom_bar() How many obsevations

  • geom_col() Values in summary tables

6.15 Flipped plots

6.15.1 Excercise 12

  • Inspect the categories of pc_priceplan_nm, are there any natural categories they could be assigned to?

  • Create a new column that groups priceplan into three or four categories, use case_when() and str_detect()

  • Create a summary table of revenue per month per priceplan category (that you previously created)

  • Visualize it with an adequate geom

  • Reorder the plot in ascending order

  • flip the coord if you get a problem with axis

  • Style your plot with a theme and labels

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   source_date = col_date(format = ""),
##   ar_key = col_character(),
##   cust_id = col_character(),
##   pc_l3_pd_spec_nm = col_character(),
##   cpe_type = col_character(),
##   cpe_net_type_cmpt = col_character(),
##   pc_priceplan_nm = col_character(),
##   sc_l5_sales_cnl = col_character(),
##   rt_fst_cstatus_act_dt = col_date(format = ""),
##   rrpu_amt_used = col_character(),
##   rcm1pu_amt_used = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 137 x 2
## # Groups:   pc_priceplan_nm [137]
##    pc_priceplan_nm                n
##    <chr>                      <int>
##  1 Fast pris                  44578
##  2 Fast pris +EU 8 GB - 2018  20011
##  3 Unlimited - 2018           15887
##  4 Fast pris +EU 3 GB - 2018  14906
##  5 Fast pris +EU 15 GB - 2018 13686
##  6 Bredband 4G                13218
##  7 Fast pris +EU 1 GB         13017
##  8 Fast pris +EU 15 GB        12634
##  9 Fast pris +EU 5 GB         12610
## 10 Rörligt pris               12273
## # … with 127 more rows

6.17 Interactivity

  • Multiple libraries for interactive, web-based, plots
  • plotly, highcharter, r2d3 and more

Plotly:

## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomLabel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

Plotly also have an own syntax for custom graphs - highly recommended for interactive applications in Shiny.

6.17.1 Excericse 13

Make some of your plots interactive with ggplotly()

6.18 Time over?

6.19 Exporting plots

There are several ways you can export visualizations.

The easiest way is to use ggsave()

Reading the documentation on ggsave() how can you use it?

You can also export visualizations to PowerPoint if using the package officer.

You can read more about officer here: https://davidgohel.github.io/officer/articles/powerpoint.html

## 
## Attaching package: 'officer'
## The following object is masked from 'package:readxl':
## 
##     read_xlsx

This way you can automate powerpoint-presentations.

6.19.1 .15

Create your own powerpoint-presentation!

You may also refer to another powerpoint-presentation in read_pptx(path = "documents/report_q2.pptx") to use it as a template. Feel free to try this out on your own computer (i.e. not in the cloud environment), read more about it on the officer-website.

6.20 Answers to excercies visualization

6.21 Excercise 1.

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   source_date = col_datetime(format = ""),
##   ar_key = col_character(),
##   cust_id = col_character(),
##   pc_l3_pd_spec_nm = col_character(),
##   cpe_type = col_character(),
##   cpe_net_type_cmpt = col_character(),
##   pc_priceplan_nm = col_character(),
##   sc_l5_sales_cnl = col_character(),
##   rt_fst_cstatus_act_dt = col_datetime(format = ""),
##   rrpu_amt_used = col_character(),
##   rcm1pu_amt_used = col_character()
## )
## See spec(...) for full column specifications.

6.22 Excercise 2.

  • Let’s make a distribution plot
## Warning: Removed 4 rows containing non-finite values (stat_density).

6.28 Excercise 8.

  • We can specify the scales as “free” or “fixed”.
  • Change scales to “free”
## Warning in qt((1 - level)/2, df): NaNs produced

6.29 Excercise 9.

Visualize the relationship between the last two months of data volume.

  • Which are you aestetics?

  • Which geom do you use?

  • Is the scale suitable?

  • Can you use any statistical computation to visualize the relationship?

6.30 Excercise 10.

Add aestetics

  • Such as fill, color or size
## Warning: Removed 4 rows containing missing values (geom_point).

6.31 Excercise 11.

Vizualise calculated revenue per month (alloc_rrpu_amt)

  • Use %>%

  • Which geom best represents the data?

  • Is the distribution normal?

  • Are there any outliers? How do you filter these out?

  • Is the scale suitable?

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   source_date = col_date(format = ""),
##   ar_key = col_character(),
##   cust_id = col_character(),
##   pc_l3_pd_spec_nm = col_character(),
##   cpe_type = col_character(),
##   cpe_net_type_cmpt = col_character(),
##   pc_priceplan_nm = col_character(),
##   sc_l5_sales_cnl = col_character(),
##   rt_fst_cstatus_act_dt = col_date(format = ""),
##   rrpu_amt_used = col_character(),
##   rcm1pu_amt_used = col_character()
## )
## See spec(...) for full column specifications.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

6.32 Excercise 12.

  • Inspect the categories of pc_priceplan_nm, are there any natural categories they could be assigned to?
## # A tibble: 137 x 2
## # Groups:   pc_priceplan_nm [137]
##    pc_priceplan_nm                n
##    <chr>                      <int>
##  1 Fast pris                  44578
##  2 Fast pris +EU 8 GB - 2018  20011
##  3 Unlimited - 2018           15887
##  4 Fast pris +EU 3 GB - 2018  14906
##  5 Fast pris +EU 15 GB - 2018 13686
##  6 Bredband 4G                13218
##  7 Fast pris +EU 1 GB         13017
##  8 Fast pris +EU 15 GB        12634
##  9 Fast pris +EU 5 GB         12610
## 10 Rörligt pris               12273
## # … with 127 more rows
  • Create a new column that groups priceplan into three or four categories
  • Create a summary table of revenue per month per priceplan category (that you previously created)

  • Visualize it with an adequate geom

  • Reorder the plot in ascending order

  • flip the coord if you get a problem with axis

  • Style your plot with a theme and labels

6.33 Excercise 13.

Make some of your plots interactive with ggplotly()

6.34 Excercise 14.

  • Add annotations, labels and other styling to your plot