2 Scatter plots

filename: 02-scatter-plot.Rmd

After completing this course, you will learn:
* How to do scatter plot in plotly using add_markers.
* How to use dplyr::mutate() to create new variables in your data frame.
* How to label yaxis to your plotly graph through layout(yaxis=list(title=...)).

Clear the environment and activate packages

rm(list=ls())
library(dplyr)
library(plotly)

2.1 Import Data

Import text file data

data.vis.2<-read.csv("./_data/Consumption_En.csv")

./ represents the current working directory, normally where your .Rproj file is.
./_data/ represents the subdirectory under ’./`.

Check the first few lines

head(data.vis.2)

Look at the class of each variable.

Variable definitions

The data contains the following variables:
Year: the year of the observation
DI_household_sum: Total household disposable income, unit: 100 million NTDs (台幣億元)
DI_household_average: Average household disposable income, unit: NTD
C_household_sum: Total household consumption, unit: 100 million NTDs (台幣億元)
C_household_average: Average household consumption, unit: NTD
S_household_sum: Total household saving, unit: 100 million NTDs (台幣億元)
S_household_average: Average household saving, unit: NTD

It also divide household into five quintile groups based on household income levels. Within each quintile, say quintileX (X=1,2,3,4,5), the dataset contains:
DI_average_quintileX: Average disposable income in the Xth quintile, unit: NTD
C_average_quintileX: Average consumption in the Xth quintile, unit: NTD
S_average_quintileX: Average saveing in the Xth quintile, unit: NTD

2.2 Introduction to scatter plot: add_markers

Basic scatter plot

add_markers

data.vis.2 %>% plot_ly() %>%
  add_markers(x=~DI_average_quintile1, y=~C_average_quintile1,          
            name="Lowest 20% income")

add text display to markers: `text=~variable_name`

We can add extra text label to each marker using text=~variable_name syntax.

data.vis.2 %>% plot_ly() %>%
  add_markers(x=~DI_average_quintile1, y=~C_average_quintile1, 
              text=~Year,
            name="Lowest 20% income")->p1

add axis title

p1 %>% layout(
  xaxis=list(title="Average disposable income"),
  yaxis=list(title="Average consumption")
)

2.3 Average propensity to consume (APC)

Create APC variable: mutate

data.vis.2 %>% mutate(APC1=C_average_quintile1/DI_average_quintile1) %>%
  plot_ly(x=~Year) %>%
  add_lines(y=~APC1)

APC of different quintiles

data.vis.2 %>% mutate(
  APC1=C_average_quintile1/DI_average_quintile1,
  APC2=C_average_quintile2/DI_average_quintile2,
  APC3=C_average_quintile3/DI_average_quintile3,
  APC4=C_average_quintile4/DI_average_quintile4,
  APC5=C_average_quintile5/DI_average_quintile5
  ) %>%
  plot_ly(x=~Year) %>%
  add_lines(y=~APC1,name="bottom 20%") %>%
  add_lines(y=~APC2,name="20-40%") %>%
  add_lines(y=~APC3,name="40-60%") %>%
  add_lines(y=~APC4,name="60-80%") %>%
  add_lines(y=~APC5,name="top 20%") %>%
  layout(
    yaxis=list(title="Average Propensity to Consume")
  )

2.4 Income and consumption inequality

One way to measure inequality is to:
- choose a base group to compare with; here we choose quintile1 group.
- the comparison operator; here we choose division.

Income inequality

data.vis.2 %>% mutate(
  DIr1=DI_average_quintile1/DI_average_quintile1,
  DIr2=DI_average_quintile2/DI_average_quintile1,
  DIr3=DI_average_quintile3/DI_average_quintile1,
  DIr4=DI_average_quintile4/DI_average_quintile1,
  DIr5=DI_average_quintile5/DI_average_quintile1
  ) %>%
  plot_ly(x=~Year) %>%
  add_lines(y=~DIr1,text=~DI_average_quintile1,name="bottom 20%") %>%
  add_lines(y=~DIr2,text=~DI_average_quintile2,name="20-40%") %>%
  add_lines(y=~DIr3,text=~DI_average_quintile3,name="40-60%") %>%
  add_lines(y=~DIr4,text=~DI_average_quintile4,name="60-80%") %>%
  add_lines(y=~DIr5,text=~DI_average_quintile5,name="top 20%") %>%
  layout(
    yaxis=list(title="Disposable income ratio <br> to the bottom 20%")
  )

Consumption inequality

data.vis.2 %>% mutate(
  Cr1=C_average_quintile1/C_average_quintile1,
  Cr2=C_average_quintile2/C_average_quintile1,
  Cr3=C_average_quintile3/C_average_quintile1,
  Cr4=C_average_quintile4/C_average_quintile1,
  Cr5=C_average_quintile5/C_average_quintile1
  ) %>%
  plot_ly(x=~Year) %>%
  add_lines(y=~Cr1,text=~C_average_quintile1,name="bottom 20%") %>%
  add_lines(y=~Cr2,text=~C_average_quintile2,name="20-40%") %>%
  add_lines(y=~Cr3,text=~C_average_quintile3,name="40-60%") %>%
  add_lines(y=~Cr4,text=~C_average_quintile4,name="60-80%") %>%
  add_lines(y=~Cr5,text=~C_average_quintile5,name="top 20%") %>%
  layout(
    yaxis=list(title="Consumption ratio <br> to the bottom 20%")
  )