Chapter 5 Answers to excercises

5.1 Excercise

  • Press the knit button to compile this Rmarkdown-notebook to a HTML-document
  • Change the author to your name, knit the document again

5.2 Excercise

The package tidyverse is downloaded for you. Load it with library().

5.3 Excercise

  • Import the customer data tele2-kunder-sample.csv into R
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   source_date = col_date(format = ""),
##   ar_key = col_character(),
##   cust_id = col_character(),
##   pc_l3_pd_spec_nm = col_character(),
##   cpe_type = col_character(),
##   cpe_net_type_cmpt = col_character(),
##   pc_priceplan_nm = col_character(),
##   sc_l5_sales_cnl = col_character(),
##   rt_fst_cstatus_act_dt = col_date(format = ""),
##   rrpu_amt_used = col_character(),
##   rcm1pu_amt_used = col_character()
## )
## See spec(...) for full column specifications.
  • Use glimpse() on it
## Observations: 300,000
## Variables: 25
## $ source_date                 <date> 2019-04-30, 2019-04-30, 2019-04-30,…
## $ ar_key                      <chr> "AAEBfhzgf0riQnApBVflWOkmMXDU8iE=", …
## $ cust_id                     <chr> "AAEBfv9gf7ePbf1Gdlo5/hEnEoU+SKw=", …
## $ pc_l3_pd_spec_nm            <chr> "MOBILE REGULAR", "MOBILE REGULAR", …
## $ cpe_type                    <chr> NA, "SMARTPHONE", "SMARTPHONE", "SMA…
## $ cpe_net_type_cmpt           <chr> NA, "4G", "4G", "4G", "4G", "4G", "4…
## $ pc_priceplan_nm             <chr> "Kompis", "Fast pris 20 GB", "Fast p…
## $ sc_l5_sales_cnl             <chr> "RETAILER", "RETAILER", "RETAILER", …
## $ rt_fst_cstatus_act_dt       <date> 2011-04-28, 2016-11-28, 2017-07-10,…
## $ tr_tot_data_vol_all_netw_1  <dbl> 0.0, 5252.1, 2844.1, 3404.9, 16003.9…
## $ tr_tot_data_vol_all_netw_2  <dbl> 0.0, 4257.8, 8479.6, 5576.5, 17052.2…
## $ tr_tot_data_vol_all_netw_3  <dbl> 0.0, 1718.6, 6746.2, 2670.8, 17990.2…
## $ tr_tot_data_vol_all_netw_4  <dbl> 0.0, 4716.0, 4620.3, 5493.7, 16964.0…
## $ tr_tot_data_vol_all_netw_5  <dbl> 0.0, 6252.0, 3374.7, 4393.4, 13623.4…
## $ tr_tot_data_vol_all_netw_6  <dbl> 0.0, 6136.5, 3142.3, 6417.5, 7598.6,…
## $ tr_tot_data_vol_all_netw_7  <dbl> 0.0, 4900.6, 3255.6, 5976.8, 10110.0…
## $ tr_tot_data_vol_all_netw_8  <dbl> 0.0, 1696.3, 7130.7, NA, 11497.0, 67…
## $ tr_tot_data_vol_all_netw_9  <dbl> NA, 3166.1, 8088.5, NA, 13318.2, 386…
## $ tr_tot_data_vol_all_netw_10 <dbl> 0.0, 6852.1, 10504.6, NA, 9623.9, 20…
## $ tr_tot_data_vol_all_netw_11 <dbl> 0.0, 5599.7, 6507.9, NA, 9259.9, 264…
## $ tr_tot_data_vol_all_netw_12 <dbl> 0.0, 2892.2, 4485.1, NA, 5254.3, 2.7…
## $ alloc_rrpu_amt              <dbl> 39.20073, 841.34443, 339.25698, 283.…
## $ alloc_rcm1pu_amt            <dbl> 39.20073, 824.74999, 239.95938, 253.…
## $ rrpu_amt_used               <chr> "Allocated RRPU", "Allocated RRPU", …
## $ rcm1pu_amt_used             <chr> "Allocated RCM1PU", "Allocated RCM1P…
  • Try knitting the Rmarkdown again

5.4 Excercise 1

  1. How many customers had a data-volume over 1000 in february 2019?
## # A tibble: 198,235 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  2 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
## # … with 198,225 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>
  1. How many customers have been “active” longer than 2005?
## # A tibble: 15,837 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  2 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfk… MOBILE REGULAR   FEATURE… 3G              
##  6 2019-04-30  AAEBf… AAEBfu… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   MOBILE … <NA>            
##  8 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   MOBILE … <NA>            
## 10 2019-04-30  AAEBf… AAEBfu… MOBILE REGULAR   SMARTPH… 4G              
## # … with 15,827 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>
  1. How many customers have a data-volume over 2000 in february and have a calculated revenue larger than 500 per month?
## # A tibble: 3,758 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  2 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfs… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfi… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfh… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBfh… MOBILE REGULAR   SMARTPH… 4G              
## # … with 3,748 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>
  1. How many customers have a priceplan with “Rörlig pris”?
## # A tibble: 0 x 25
## # … with 25 variables: source_date <date>, ar_key <chr>, cust_id <chr>,
## #   pc_l3_pd_spec_nm <chr>, cpe_type <chr>, cpe_net_type_cmpt <chr>,
## #   pc_priceplan_nm <chr>, sc_l5_sales_cnl <chr>,
## #   rt_fst_cstatus_act_dt <date>, tr_tot_data_vol_all_netw_1 <dbl>,
## #   tr_tot_data_vol_all_netw_2 <dbl>, tr_tot_data_vol_all_netw_3 <dbl>,
## #   tr_tot_data_vol_all_netw_4 <dbl>, tr_tot_data_vol_all_netw_5 <dbl>,
## #   tr_tot_data_vol_all_netw_6 <dbl>, tr_tot_data_vol_all_netw_7 <dbl>,
## #   tr_tot_data_vol_all_netw_8 <dbl>, tr_tot_data_vol_all_netw_9 <dbl>,
## #   tr_tot_data_vol_all_netw_10 <dbl>, tr_tot_data_vol_all_netw_11 <dbl>,
## #   tr_tot_data_vol_all_netw_12 <dbl>, alloc_rrpu_amt <dbl>,
## #   alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>, rcm1pu_amt_used <chr>
  1. Are there any customers that are missing an ID? I.e. is NA.
## # A tibble: 0 x 25
## # … with 25 variables: source_date <date>, ar_key <chr>, cust_id <chr>,
## #   pc_l3_pd_spec_nm <chr>, cpe_type <chr>, cpe_net_type_cmpt <chr>,
## #   pc_priceplan_nm <chr>, sc_l5_sales_cnl <chr>,
## #   rt_fst_cstatus_act_dt <date>, tr_tot_data_vol_all_netw_1 <dbl>,
## #   tr_tot_data_vol_all_netw_2 <dbl>, tr_tot_data_vol_all_netw_3 <dbl>,
## #   tr_tot_data_vol_all_netw_4 <dbl>, tr_tot_data_vol_all_netw_5 <dbl>,
## #   tr_tot_data_vol_all_netw_6 <dbl>, tr_tot_data_vol_all_netw_7 <dbl>,
## #   tr_tot_data_vol_all_netw_8 <dbl>, tr_tot_data_vol_all_netw_9 <dbl>,
## #   tr_tot_data_vol_all_netw_10 <dbl>, tr_tot_data_vol_all_netw_11 <dbl>,
## #   tr_tot_data_vol_all_netw_12 <dbl>, alloc_rrpu_amt <dbl>,
## #   alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>, rcm1pu_amt_used <chr>
  • Knit the document again

5.5 Excercise 2

  1. How many customers have a subscription with “Fast pris”?
## # A tibble: 170,324 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  2 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfp… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   SMARTPH… 4G              
## # … with 170,314 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>
  1. How many customers have a subscription that is not “Bredband”?
## # A tibble: 257,391 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 257,381 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>
  • …dare I say… knit the document again?

5.6 Excercise 3

  1. Which customer has been “active” longest? What is the date?
## # A tibble: 300,000 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   SMARTPH… 4G              
##  2 2019-04-30  AAEBf… AAEBfn… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfk… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   MOBILE … 2G              
##  6 2019-04-30  AAEBf… AAEBfl… MOBILE REGULAR   MOBILE … 2.5G            
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfi… MOBILE REGULAR   MOBILE … <NA>            
##  9 2019-04-30  AAEBf… AAEBfs… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBfo… MOBILE REGULAR   SMARTPH… 2.5G            
## # … with 299,990 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>
  1. Which customer is most newly active?
## # A tibble: 300,000 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfu… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   <NA>     <NA>            
##  3 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  4 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   <NA>     <NA>            
##  6 2019-04-30  AAEBf… AAEBfl… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfi… MOBILE REGULAR   <NA>     <NA>            
##  8 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   <NA>     <NA>            
##  9 2019-04-30  AAEBf… AAEBfp… MOBILE REGULAR   <NA>     <NA>            
## 10 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   <NA>     <NA>            
## # … with 299,990 more rows, and 19 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>

5.7 Excercise 4

  1. Choose all columns that contain ”nm"
## # A tibble: 300,000 x 2
##    pc_l3_pd_spec_nm pc_priceplan_nm           
##    <chr>            <chr>                     
##  1 MOBILE REGULAR   Kompis                    
##  2 MOBILE REGULAR   Fast pris 20 GB           
##  3 MOBILE REGULAR   Fast pris +EU 15 GB       
##  4 MOBILE REGULAR   Fast pris +EU 15 GB - 2018
##  5 MOBILE REGULAR   Fast pris                 
##  6 MOBILE REGULAR   Fast pris +EU 5 GB        
##  7 MOBILE REGULAR   Rörligt pris              
##  8 MOBILE REGULAR   Fast pris +EU 1 GB        
##  9 MOBILE REGULAR   Fast pris                 
## 10 MOBILE REGULAR   Fast pris                 
## # … with 299,990 more rows
  1. Choose the column for customer ID and all columns that starts with ”tr_tot"
## # A tibble: 300,000 x 13
##    cust_id tr_tot_data_vol… tr_tot_data_vol… tr_tot_data_vol…
##    <chr>              <dbl>            <dbl>            <dbl>
##  1 AAEBfv…              0                0                0  
##  2 AAEBfr…           5252.            4258.            1719. 
##  3 AAEBfm…           2844.            8480.            6746. 
##  4 AAEBfm…           3405.            5576.            2671. 
##  5 AAEBfr…          16004.           17052.           17990. 
##  6 AAEBfg…              0.3              5.1            830. 
##  7 AAEBfg…              0                0                0  
##  8 AAEBfq…             28.3             30.4             28.5
##  9 AAEBfr…           6048.            6750             5277. 
## 10 AAEBft…           3986.            3583.            2584. 
## # … with 299,990 more rows, and 9 more variables:
## #   tr_tot_data_vol_all_netw_4 <dbl>, tr_tot_data_vol_all_netw_5 <dbl>,
## #   tr_tot_data_vol_all_netw_6 <dbl>, tr_tot_data_vol_all_netw_7 <dbl>,
## #   tr_tot_data_vol_all_netw_8 <dbl>, tr_tot_data_vol_all_netw_9 <dbl>,
## #   tr_tot_data_vol_all_netw_10 <dbl>, tr_tot_data_vol_all_netw_11 <dbl>,
## #   tr_tot_data_vol_all_netw_12 <dbl>
  1. Rename ”pc_priceplan_nm" to ”price_plan"
## # A tibble: 300,000 x 25
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 299,990 more rows, and 19 more variables: price_plan <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>

5.8 Excercise 5

  1. Create a new variable that is the mean of the last 3 months of data consumption for all customers, save this to the data set
  1. Create a variable that takes the logarithm with base 10 (log10) of your previously created column
## # A tibble: 300,000 x 27
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 299,990 more rows, and 21 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>, mean_3_months <dbl>, log10_mean_3_months <dbl>
  1. Create a new variable that indicates if the priceplan is “Bredband” or not, save it to the data set
  1. Create a new variable that groups priceplan in “Fast pris”, “Rörligt pris”, “Bredband” and “Annan” for everything that is not in any of the previous.
## # A tibble: 4 x 2
## # Groups:   priceplan_cat [4]
##   priceplan_cat      n
##   <chr>          <int>
## 1 Annan          74795
## 2 Bredband       42608
## 3 Fast pris     170324
## 4 Rörlig pris    12273

5.9 Excercise 6

  1. Create a varible for month lubridate::month(x) of customer activation
## # A tibble: 300,000 x 29
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 299,990 more rows, and 23 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>, mean_3_months <dbl>, bredband <chr>,
## #   priceplan_cat <chr>, month <dbl>
  1. Create a new varibale for year of customer activation
## # A tibble: 300,000 x 29
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 299,990 more rows, and 23 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>, mean_3_months <dbl>, bredband <chr>,
## #   priceplan_cat <chr>, year <dbl>
  1. Create a new variable with the number of days in the month of activation
## # A tibble: 300,000 x 29
##    source_date ar_key cust_id pc_l3_pd_spec_nm cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 299,990 more rows, and 23 more variables: pc_priceplan_nm <chr>,
## #   sc_l5_sales_cnl <chr>, rt_fst_cstatus_act_dt <date>,
## #   tr_tot_data_vol_all_netw_1 <dbl>, tr_tot_data_vol_all_netw_2 <dbl>,
## #   tr_tot_data_vol_all_netw_3 <dbl>, tr_tot_data_vol_all_netw_4 <dbl>,
## #   tr_tot_data_vol_all_netw_5 <dbl>, tr_tot_data_vol_all_netw_6 <dbl>,
## #   tr_tot_data_vol_all_netw_7 <dbl>, tr_tot_data_vol_all_netw_8 <dbl>,
## #   tr_tot_data_vol_all_netw_9 <dbl>, tr_tot_data_vol_all_netw_10 <dbl>,
## #   tr_tot_data_vol_all_netw_11 <dbl>, tr_tot_data_vol_all_netw_12 <dbl>,
## #   alloc_rrpu_amt <dbl>, alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>,
## #   rcm1pu_amt_used <chr>, mean_3_months <dbl>, bredband <chr>,
## #   priceplan_cat <chr>, days_in_month <int>

5.10 Excercise 7

  1. What is the sum data volume during the last month? What’s the mean and median and what are the max and min values? You can use max() and min() to calculate maximum and minimum-values.
## # A tibble: 1 x 5
##   sum_data_vol mean_data_vol median_data_vol min_data_vol max_data_vol
##          <dbl>         <dbl>           <dbl>        <dbl>        <dbl>
## 1  5522899512.        19133.            4174            0     1674204.

5.11 Excercise 8

Use %>% and answer the following questions:

  1. Which CPE type is most common?
## # A tibble: 21 x 2
## # Groups:   cpe_type [21]
##    cpe_type                        n
##    <chr>                       <int>
##  1 SMARTPHONE                 236839
##  2 WLAN ROUTER                 18637
##  3 MODEM                       16859
##  4 MOBILE PHONE/FEATURE PHONE  10763
##  5 <NA>                         6014
##  6 TABLET                       5396
##  7 FEATURE PHONE                2313
##  8 ROUTER                       1041
##  9 USB MODEM                    1016
## 10 DONGLE                        802
## # … with 11 more rows
  1. Which priceplan has the highest mean data volume (for febraury 2019)?
## # A tibble: 137 x 2
##    pc_priceplan_nm                mean_data_vol
##    <chr>                                  <dbl>
##  1 Bredband 4G Only 400 GB - 2018       181130.
##  2 Bredband 4G Only 400 GB - 2019       168404.
##  3 Unlimited                            140013.
##  4 Mobilt Bredband 4GL Obegränsad       125286.
##  5 Bredband 4G 200 GB                   115727.
##  6 Fast pris 200 GB                     110948.
##  7 Bredband 4G Only 200 GB               93155.
##  8 Bredband 4G Only 200 GB - 2018        92934.
##  9 Unlimited - 2018                      70179.
## 10 Bredband 4G Only 100 GB - 2019        65644.
## # … with 127 more rows
  1. Calculate the mean of data volume for the year that the customer was created. Which year has the highest mean?
## # A tibble: 28 x 2
##     year mean_data_vol
##    <dbl>         <dbl>
##  1  2017        37434.
##  2  2018        32874.
##  3  2019        25331.
##  4  2016        21831.
##  5  2013        20337.
##  6  2015        19447.
##  7  2014        16353.
##  8  2012        14911.
##  9  2011        12822.
## 10    NA        11904.
## # … with 18 more rows

5.12 Excercise 9

Join you data with tele2-transactions.csv by cust_id.

## Parsed with column specification:
## cols(
##   su_contract_dt = col_date(format = ""),
##   ar_key = col_character(),
##   cust_id = col_character(),
##   false_gi = col_double(),
##   increasednosub = col_double(),
##   pc_l3_pd_spec_nm = col_character(),
##   cust_tp = col_character(),
##   sc_l5_sales_cnl = col_character(),
##   pc_priceplan_nm = col_character(),
##   bas_buck_size = col_character(),
##   rrpu = col_double()
## )
## # A tibble: 300,000 x 37
##    source_date ar_key cust_id pc_l3_pd_spec_n… cpe_type cpe_net_type_cm…
##    <date>      <chr>  <chr>   <chr>            <chr>    <chr>           
##  1 2019-04-30  AAEBf… AAEBfv… MOBILE REGULAR   <NA>     <NA>            
##  2 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  3 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  4 2019-04-30  AAEBf… AAEBfm… MOBILE REGULAR   SMARTPH… 4G              
##  5 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
##  6 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  7 2019-04-30  AAEBf… AAEBfg… MOBILE REGULAR   SMARTPH… 4G              
##  8 2019-04-30  AAEBf… AAEBfq… MOBILE REGULAR   SMARTPH… 4G              
##  9 2019-04-30  AAEBf… AAEBfr… MOBILE REGULAR   SMARTPH… 4G              
## 10 2019-04-30  AAEBf… AAEBft… MOBILE REGULAR   SMARTPH… 4G              
## # … with 299,990 more rows, and 31 more variables:
## #   pc_priceplan_nm.x <chr>, sc_l5_sales_cnl.x <chr>,
## #   rt_fst_cstatus_act_dt <date>, tr_tot_data_vol_all_netw_1 <dbl>,
## #   tr_tot_data_vol_all_netw_2 <dbl>, tr_tot_data_vol_all_netw_3 <dbl>,
## #   tr_tot_data_vol_all_netw_4 <dbl>, tr_tot_data_vol_all_netw_5 <dbl>,
## #   tr_tot_data_vol_all_netw_6 <dbl>, tr_tot_data_vol_all_netw_7 <dbl>,
## #   tr_tot_data_vol_all_netw_8 <dbl>, tr_tot_data_vol_all_netw_9 <dbl>,
## #   tr_tot_data_vol_all_netw_10 <dbl>, tr_tot_data_vol_all_netw_11 <dbl>,
## #   tr_tot_data_vol_all_netw_12 <dbl>, alloc_rrpu_amt <dbl>,
## #   alloc_rcm1pu_amt <dbl>, rrpu_amt_used <chr>, rcm1pu_amt_used <chr>,
## #   mean_3_months <dbl>, bredband <chr>, priceplan_cat <chr>,
## #   su_contract_dt <date>, false_gi <dbl>, increasednosub <dbl>,
## #   pc_l3_pd_spec_nm.y <chr>, cust_tp <chr>, sc_l5_sales_cnl.y <chr>,
## #   pc_priceplan_nm.y <chr>, bas_buck_size <chr>, rrpu <dbl>

5.13 Excercise 10

In your data set you have 12 columns for data volume consumption per month.

You want to calculate the difference of these and therefore have to gather these columns.

  • Fill in the blank spaces to:
  • Gather the columns
  • Save it to a new data.frame, for example kunder_tidy_month
  • Change eval=FALSE to eval=TRUE before knitting