2 Scatter plots
filename: 02-scatter-plot.Rmd
After completing this course, you will learn:
* How to do scatter plot in plotly using add_markers
.
* How to use dplyr::mutate()
to create new variables in your data frame.
* How to label yaxis to your plotly graph through layout(yaxis=list(title=...))
.
Clear the environment and activate packages
rm(list=ls())
library(dplyr)
library(plotly)
2.1 Import Data
Import text file data
data.vis.2<-read.csv("./_data/Consumption_En.csv")
./
represents the current working directory, normally where your .Rproj file is.
./_data/
represents the subdirectory under ’./`.
Check the first few lines
head(data.vis.2)
Look at the class of each variable.
Variable definitions
The data contains the following variables:
Year: the year of the observation
DI_household_sum: Total household disposable income, unit: 100 million NTDs (台幣億元)
DI_household_average: Average household disposable income, unit: NTD
C_household_sum: Total household consumption, unit: 100 million NTDs (台幣億元)
C_household_average: Average household consumption, unit: NTD
S_household_sum: Total household saving, unit: 100 million NTDs (台幣億元)
S_household_average: Average household saving, unit: NTD
It also divide household into five quintile groups based on household income levels. Within each quintile, say quintileX (X=1,2,3,4,5), the dataset contains:
DI_average_quintileX: Average disposable income in the Xth quintile, unit: NTD
C_average_quintileX: Average consumption in the Xth quintile, unit: NTD
S_average_quintileX: Average saveing in the Xth quintile, unit: NTD
2.2 Introduction to scatter plot: add_markers
Basic scatter plot
add_markers
data.vis.2 %>% plot_ly() %>%
add_markers(x=~DI_average_quintile1, y=~C_average_quintile1,
name="Lowest 20% income")
add text display to markers: text=~variable_name
We can add extra text label to each marker using text=~variable_name
syntax.
data.vis.2 %>% plot_ly() %>%
add_markers(x=~DI_average_quintile1, y=~C_average_quintile1,
text=~Year,
name="Lowest 20% income")->p1
add axis title
p1 %>% layout(
xaxis=list(title="Average disposable income"),
yaxis=list(title="Average consumption")
)
2.3 Average propensity to consume (APC)
Create APC variable: mutate
data.vis.2 %>% mutate(APC1=C_average_quintile1/DI_average_quintile1) %>%
plot_ly(x=~Year) %>%
add_lines(y=~APC1)
APC of different quintiles
data.vis.2 %>% mutate(
APC1=C_average_quintile1/DI_average_quintile1,
APC2=C_average_quintile2/DI_average_quintile2,
APC3=C_average_quintile3/DI_average_quintile3,
APC4=C_average_quintile4/DI_average_quintile4,
APC5=C_average_quintile5/DI_average_quintile5
) %>%
plot_ly(x=~Year) %>%
add_lines(y=~APC1,name="bottom 20%") %>%
add_lines(y=~APC2,name="20-40%") %>%
add_lines(y=~APC3,name="40-60%") %>%
add_lines(y=~APC4,name="60-80%") %>%
add_lines(y=~APC5,name="top 20%") %>%
layout(
yaxis=list(title="Average Propensity to Consume")
)
2.4 Income and consumption inequality
One way to measure inequality is to:
- choose a base group to compare with; here we choose quintile1 group.
- the comparison operator; here we choose division.
Income inequality
data.vis.2 %>% mutate(
DIr1=DI_average_quintile1/DI_average_quintile1,
DIr2=DI_average_quintile2/DI_average_quintile1,
DIr3=DI_average_quintile3/DI_average_quintile1,
DIr4=DI_average_quintile4/DI_average_quintile1,
DIr5=DI_average_quintile5/DI_average_quintile1
) %>%
plot_ly(x=~Year) %>%
add_lines(y=~DIr1,text=~DI_average_quintile1,name="bottom 20%") %>%
add_lines(y=~DIr2,text=~DI_average_quintile2,name="20-40%") %>%
add_lines(y=~DIr3,text=~DI_average_quintile3,name="40-60%") %>%
add_lines(y=~DIr4,text=~DI_average_quintile4,name="60-80%") %>%
add_lines(y=~DIr5,text=~DI_average_quintile5,name="top 20%") %>%
layout(
yaxis=list(title="Disposable income ratio <br> to the bottom 20%")
)
Consumption inequality
data.vis.2 %>% mutate(
Cr1=C_average_quintile1/C_average_quintile1,
Cr2=C_average_quintile2/C_average_quintile1,
Cr3=C_average_quintile3/C_average_quintile1,
Cr4=C_average_quintile4/C_average_quintile1,
Cr5=C_average_quintile5/C_average_quintile1
) %>%
plot_ly(x=~Year) %>%
add_lines(y=~Cr1,text=~C_average_quintile1,name="bottom 20%") %>%
add_lines(y=~Cr2,text=~C_average_quintile2,name="20-40%") %>%
add_lines(y=~Cr3,text=~C_average_quintile3,name="40-60%") %>%
add_lines(y=~Cr4,text=~C_average_quintile4,name="60-80%") %>%
add_lines(y=~Cr5,text=~C_average_quintile5,name="top 20%") %>%
layout(
yaxis=list(title="Consumption ratio <br> to the bottom 20%")
)