# 2 Scatter plots

filename: 02-scatter-plot.Rmd

After completing this course, you will learn:
* How to do scatter plot in plotly using `add_markers`.
* How to use `dplyr::mutate()` to create new variables in your data frame.
* How to label yaxis to your plotly graph through `layout(yaxis=list(title=...))`.

### Clear the environment and activate packages

``````rm(list=ls())
library(dplyr)
library(plotly)``````

## 2.1 Import Data

### Import text file data

``data.vis.2<-read.csv("./_data/Consumption_En.csv")``

`./` represents the current working directory, normally where your .Rproj file is.
`./_data/` represents the subdirectory under ’./`.

### Check the first few lines

``head(data.vis.2)``

Look at the class of each variable.

### Variable definitions

The data contains the following variables:
Year: the year of the observation
DI_household_sum: Total household disposable income, unit: 100 million NTDs (台幣億元)
DI_household_average: Average household disposable income, unit: NTD
C_household_sum: Total household consumption, unit: 100 million NTDs (台幣億元)
C_household_average: Average household consumption, unit: NTD
S_household_sum: Total household saving, unit: 100 million NTDs (台幣億元)
S_household_average: Average household saving, unit: NTD

It also divide household into five quintile groups based on household income levels. Within each quintile, say quintileX (X=1,2,3,4,5), the dataset contains:
DI_average_quintileX: Average disposable income in the Xth quintile, unit: NTD
C_average_quintileX: Average consumption in the Xth quintile, unit: NTD
S_average_quintileX: Average saveing in the Xth quintile, unit: NTD

## 2.2 Introduction to scatter plot: add_markers

### Basic scatter plot

``````data.vis.2 %>% plot_ly() %>%
name="Lowest 20% income")``````

#### add text display to markers: `text=~variable_name`

We can add extra text label to each marker using `text=~variable_name` syntax.

``````data.vis.2 %>% plot_ly() %>%
text=~Year,
name="Lowest 20% income")->p1``````

``````p1 %>% layout(
xaxis=list(title="Average disposable income"),
yaxis=list(title="Average consumption")
)``````

## 2.3 Average propensity to consume (APC)

### Create APC variable: mutate

``````data.vis.2 %>% mutate(APC1=C_average_quintile1/DI_average_quintile1) %>%
plot_ly(x=~Year) %>%

### APC of different quintiles

``````data.vis.2 %>% mutate(
APC1=C_average_quintile1/DI_average_quintile1,
APC2=C_average_quintile2/DI_average_quintile2,
APC3=C_average_quintile3/DI_average_quintile3,
APC4=C_average_quintile4/DI_average_quintile4,
APC5=C_average_quintile5/DI_average_quintile5
) %>%
plot_ly(x=~Year) %>%
layout(
yaxis=list(title="Average Propensity to Consume")
) ``````

## 2.4 Income and consumption inequality

One way to measure inequality is to:
- choose a base group to compare with; here we choose quintile1 group.
- the comparison operator; here we choose division.

### Income inequality

``````data.vis.2 %>% mutate(
DIr1=DI_average_quintile1/DI_average_quintile1,
DIr2=DI_average_quintile2/DI_average_quintile1,
DIr3=DI_average_quintile3/DI_average_quintile1,
DIr4=DI_average_quintile4/DI_average_quintile1,
DIr5=DI_average_quintile5/DI_average_quintile1
) %>%
plot_ly(x=~Year) %>%
layout(
yaxis=list(title="Disposable income ratio <br> to the bottom 20%")
) ``````

### Consumption inequality

``````data.vis.2 %>% mutate(
Cr1=C_average_quintile1/C_average_quintile1,
Cr2=C_average_quintile2/C_average_quintile1,
Cr3=C_average_quintile3/C_average_quintile1,
Cr4=C_average_quintile4/C_average_quintile1,
Cr5=C_average_quintile5/C_average_quintile1
) %>%
plot_ly(x=~Year) %>%