2 Scatter plots

filename: 02-scatter-plot.Rmd

After completing this course, you will learn:
* How to do scatter plot in plotly using add_markers.
* How to use dplyr::mutate() to create new variables in your data frame.
* How to label yaxis to your plotly graph through layout(yaxis=list(title=...)).

rm(list=ls())
library(dplyr)
library(plotly)

2.1 Import Data

Import text file data

./ represents the current working directory, normally where your .Rproj file is.
./_data/ represents the subdirectory under ’./`.

Check the first few lines

Look at the class of each variable.

Variable definitions

The data contains the following variables:
Year: the year of the observation
DI_household_sum: Total household disposable income, unit: 100 million NTDs (台幣億元)
DI_household_average: Average household disposable income, unit: NTD
C_household_sum: Total household consumption, unit: 100 million NTDs (台幣億元)
C_household_average: Average household consumption, unit: NTD
S_household_sum: Total household saving, unit: 100 million NTDs (台幣億元)
S_household_average: Average household saving, unit: NTD

It also divide household into five quintile groups based on household income levels. Within each quintile, say quintileX (X=1,2,3,4,5), the dataset contains:
DI_average_quintileX: Average disposable income in the Xth quintile, unit: NTD
C_average_quintileX: Average consumption in the Xth quintile, unit: NTD
S_average_quintileX: Average saveing in the Xth quintile, unit: NTD

2.2 Introduction to scatter plot: add_markers

Basic scatter plot

data.vis.2 %>% plot_ly() %>%
name="Lowest 20% income")

add text display to markers: text=~variable_name

We can add extra text label to each marker using text=~variable_name syntax.

data.vis.2 %>% plot_ly() %>%
text=~Year,
name="Lowest 20% income")->p1

p1 %>% layout(
xaxis=list(title="Average disposable income"),
yaxis=list(title="Average consumption")
)

2.3 Average propensity to consume (APC)

Create APC variable: mutate

data.vis.2 %>% mutate(APC1=C_average_quintile1/DI_average_quintile1) %>%
plot_ly(x=~Year) %>%

APC of different quintiles

data.vis.2 %>% mutate(
APC1=C_average_quintile1/DI_average_quintile1,
APC2=C_average_quintile2/DI_average_quintile2,
APC3=C_average_quintile3/DI_average_quintile3,
APC4=C_average_quintile4/DI_average_quintile4,
APC5=C_average_quintile5/DI_average_quintile5
) %>%
plot_ly(x=~Year) %>%
layout(
yaxis=list(title="Average Propensity to Consume")
)

2.4 Income and consumption inequality

One way to measure inequality is to:
- choose a base group to compare with; here we choose quintile1 group.
- the comparison operator; here we choose division.

Income inequality

data.vis.2 %>% mutate(
DIr1=DI_average_quintile1/DI_average_quintile1,
DIr2=DI_average_quintile2/DI_average_quintile1,
DIr3=DI_average_quintile3/DI_average_quintile1,
DIr4=DI_average_quintile4/DI_average_quintile1,
DIr5=DI_average_quintile5/DI_average_quintile1
) %>%
plot_ly(x=~Year) %>%
layout(
yaxis=list(title="Disposable income ratio <br> to the bottom 20%")
)

Consumption inequality

data.vis.2 %>% mutate(
Cr1=C_average_quintile1/C_average_quintile1,
Cr2=C_average_quintile2/C_average_quintile1,
Cr3=C_average_quintile3/C_average_quintile1,
Cr4=C_average_quintile4/C_average_quintile1,
Cr5=C_average_quintile5/C_average_quintile1
) %>%
plot_ly(x=~Year) %>%