3  Basic Data Visualizations

Visualizing data is a crucial step in translating raw information into insights that can be quickly understood and acted upon. Before creating meaningful charts or plots, it is essential to understand the characteristics of the underlying data, including its type, structure, and key attributes. Without this foundational knowledge, visualizations may be misleading or fail to convey the intended message.

This section focuses on Basic Data Visualizations (Figure 3.1), explaining how data can be categorized into numeric (quantitative) and categorical (qualitative) forms, along with subtypes like discrete, continuous, nominal, and ordinal. It also discusses common data sources and the fundamental elements of a dataset, such as variables and observations, which are essential for selecting appropriate visualization methods.

As discussed in the section of Data Exploration, understanding data types and structure is essential before creating visualizations. By considering the structure of datasets, including variables, observations, and data sources, readers will be able to select appropriate charts and plots—such as histograms for continuous data, bar charts for categorical data, or scatter plots for examining relationships—thereby clearly revealing patterns, trends, and actionable insights from the dataset [1].

Figure 3.1: Basic Data Visualizations 5W+1H

According to the mindmap, the following section will explore several fundamental data visualizations by emphasizing their types, purposes, applications, users, and tools. Starting with these essential visualizations is crucial before progressing to more advanced analytical techniques. These visuals not only help us understand distributions, comparisons, and relationships between variables in a simple yet informative way but also provide the foundation for deeper analysis. By mastering these basics, we can communicate insights more effectively, spot hidden patterns, and make data-driven decisions with greater confidence [2], [3], knaflic2015?.

3.1 Dataset

This dataset represents 200 simulated sales transactions from various cities across Indonesia during the year 2024. It is designed to illustrate different types of data commonly found in business and analytics contexts — including nominal, ordinal, discrete, and continuous variables.

Each row in the dataset corresponds to a single customer transaction, recording essential details such as date, product type, city, customer tier, quantity sold, price, and payment method. The dataset is intentionally structured to be used for teaching and practicing data exploration, visualization, and analysis in tools like R, Python, Excel, or Power BI.

3.1.1 Purpose of the Dataset

The dataset can be used to:

  • Demonstrate how to identify and classify different data types (nominal, ordinal, discrete, continuous).
  • Practice generating and interpreting common visualizations such as line chart, bar charts, histograms, pie charts, boxplots, and scatter plots.
  • Perform exploratory data analysis (EDA) on sales trends, customer segments, and pricing patterns.
  • Explore relationships between variables, such as how quantity and price affect total sales or how customer tiers differ across payment methods.

3.1.2 Dataset Overview

Column Example Data Type Description
TransactionID T0045 Nominal Unique identifier for each transaction
TransactionDate 2024-05-14 Date Date of transaction
ProductCategory Electronics Nominal Category of the purchased product
City Jakarta Nominal City where the transaction occurred
CustomerTier Gold Ordinal Customer level (Bronze < Silver < Gold < Platinum)
Quantity 3 Discrete Number of items sold
UnitPrice 1,200,000 Continuous Price per unit of the product
TotalPrice 3,600,000 Continuous Total transaction value
PaymentMethod Credit Card Nominal Payment method used by the customer
library(DT)
# Generate Sales Transaction Dataset in R
# ==============================

set.seed(123)  # to make results reproducible

# --- 1. Define base variables ---
TransactionID <- sprintf("T%04d", 1:200)

# random transaction dates across 2024
TransactionDate <- sample(seq(as.Date("2024-01-01"), as.Date("2024-12-31"), 
                              by = "day"), 200, replace = TRUE)

# product categories
ProductCategory <- sample(c("Electronics", "Groceries", "Fashion", 
                            "Furniture", "Beauty"), 200, replace = TRUE)

# 20 major cities in Indonesia
City <- sample(c(
  "Jakarta", "Surabaya", "Bandung", "Medan", "Semarang", "Palembang", 
  "Makassar", "Bekasi", "Tangerang", "Depok", "Batam", "Pekanbaru", 
  "Bandar Lampung", "Denpasar", "Padang", "Malang", "Banjarmasin", 
  "Pontianak", "Manado", "Balikpapan"
), 200, replace = TRUE)

# customer tier (Ordinal)
CustomerTier <- sample(c("Bronze", "Silver", "Gold", "Platinum"), 200, 
                       replace = TRUE, prob = c(0.3, 0.4, 0.2, 0.1))

# quantity of products (Discrete)
Quantity <- sample(1:10, 200, replace = TRUE)

# unit price (Continuous)
UnitPrice <- round(runif(200, 20000, 3000000), 0)

# total price (Continuous)
TotalPrice <- Quantity * UnitPrice

# payment method (Nominal)
PaymentMethod <- sample(c("Cash", "Credit Card", "Debit Card", "E-Wallet"), 
                        200, replace = TRUE)

# --- 2. Combine into a data frame ---
sales_data <- data.frame(
  TransactionID,
  TransactionDate,
  ProductCategory,
  City,
  CustomerTier,
  Quantity,
  UnitPrice,
  TotalPrice,
  PaymentMethod
)

# Display the data frame as a neat table
datatable(sales_data, 
          caption = "Table of Dataset",
          rownames = FALSE) # hides the index column

3.2 Line Chart

A Line Chart is a data visualization tool that illustrates how values change over a sequence, typically over time. It connects data points with a continuous line, making it ideal for displaying trends and patterns in time-series data Wickham2016?. Line charts are particularly useful for:

  • Identifying Seasonal Patterns: Recognizing recurring fluctuations at regular intervals, such as increased sales during holidays [4].
  • Detecting Growth or Decline Trends: Observing upward or downward movements in data over time [5].
  • Spotting Peaks or Dips: Highlighting significant increases or decreases in activity, such as sales spikes during promotions [6].

In this Dataset, we can use a line chart to show how total sales or the number of transactions change across dates during the year 2024.

3.2.1 Basic Line Chart

# Step 1: Pastikan TransactionDate dalam format Date
sales_data$TransactionDate <- as.Date(sales_data$TransactionDate, format = "%Y-%m-%d")

# Step 2: Hitung total penjualan per bulan
sales_trend <- aggregate(TotalPrice ~ format(sales_data$TransactionDate, "%Y-%m"), 
                         data = sales_data, sum)

# Step 3: Ubah nama kolom jadi lebih jelas
names(sales_trend) <- c("MonthStr", "TotalSales")

# Step 4: Tambahkan "-01" agar jadi format tanggal lengkap
sales_trend$Month <- as.Date(paste0(sales_trend$MonthStr, "-01"), format = "%Y-%m-%d")

# Step 5: Plot line chart
plot(
  sales_trend$Month,
  sales_trend$TotalSales,
  type = "o",
  col = "steelblue",
  pch = 16,
  lwd = 2,
  main = "Monthly Sales Trend (From sales_data)",
  xlab = "Month",
  ylab = "Total Sales (IDR)"
)
grid(col = "gray80", lty = "dotted")

3.2.2 Line Chart using ggplot2

# Load required packages
library(ggplot2)
library(dplyr)
library(lubridate)

# Summarize total sales by month
sales_trend <- sales_data %>%
  mutate(Month = floor_date(TransactionDate, "month")) %>%
  group_by(Month) %>%
  summarise(TotalSales = sum(TotalPrice))

# Create line chart
ggplot(sales_trend, aes(x = Month, y = TotalSales)) +
  geom_line(color = "steelblue", linewidth = 1.2) +  # updated aesthetic
  geom_point(color = "darkorange", size = 2) +
  labs(
    title = "Monthly Sales Trend in 2024",
    x = "Month",
    y = "Total Sales (IDR)"
  ) +
  theme_minimal()

3.3 Bar Chart

A Bar Chart is a type of data visualization used to represent categorical data with rectangular bars. Each bar’s height (or length) corresponds to the value or frequency of a category, making it easy to compare quantities across different groups Yi2023?.

Bar charts are especially suitable for:

  • Discrete numeric data – numbers that can only take specific values (e.g., number of items purchased) [7].
  • Ordinal categorical data – categories with a natural order (e.g., customer satisfaction levels: Low, Medium, High) [8].

In this Dataset, the Bar Chart is used to show the Total Sales by City. This allows us to quickly identify which cities contribute the most to total sales performance [9].

Insights:

  • Taller bars indicate higher total sales.
  • The chart helps compare city-level sales performance visually.
  • It is ideal for categorical variables such as City and discrete numeric values like TotalPrice.
  • For ordinal data, bar charts make it easy to observe trends or patterns across ordered categories.

3.3.1 Basic Bar Chart

# Step 1: Aggregate total sales per city
sales_city <- aggregate(TotalPrice ~ City, data = sales_data, sum)

# Step 2: Sort data by total sales (descending)
sales_city <- sales_city[order(sales_city$TotalPrice, decreasing = TRUE), ]

# Step 3: Set margins 
par(mar = c(8, 5, 4, 2))  # c(bottom, left, top, right)

# Step 4: Create bar chart
barplot(
  height = sales_city$TotalPrice,
  names.arg = sales_city$City,
  col = "steelblue",
  las = 2,                     # rotate city labels vertically
  cex.names = 0.8,             # reduce font size of city names
  main = "Total Sales by City",
  xlab = "",
  ylab = ""
)

# Optional: Add grid lines
grid(nx = NA, ny = NULL, col = "gray80", lty = "dotted")

3.3.2 Bar Chart using ggplot2

# Load ggplot2
library(ggplot2)

# Summarize total sales per city
sales_city <- aggregate(TotalPrice ~ City, data = sales_data, sum)

# Sort city by total sales (descending)
sales_city <- sales_city[order(sales_city$TotalPrice, decreasing = TRUE), ]

# Create bar chart
ggplot(sales_city, aes(x = reorder(City, -TotalPrice), y = TotalPrice)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = round(TotalPrice/1e6, 1)), 
            vjust = -0.5, size = 3, color = "black") +
  labs(
    title = "Total Sales by City",
    x = "City",
    y = "Total Sales (in Millions IDR)"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
    plot.title = element_text(size = 14, face = "bold")
  )

3.4 Histogram

A Histogram is a graphical representation of the distribution of numerical data. It divides the data into intervals, known as bins, and displays the frequency of data points within each bin. This visualization helps identify patterns such as the central tendency, spread, skewness, and the presence of multiple modes in the data Wickham2016?.

Histograms are particularly effective for:

  • Visualizing the Distribution: They provide a clear picture of how data is distributed across different ranges, helping to identify the shape of the distribution (e.g., normal, skewed, bimodal) [9].
  • Identifying Central Tendency and Spread: By observing the peak of the histogram, one can infer the central value of the data. The width of the histogram indicates the variability or spread of the data [10].
  • Detecting Skewness: The asymmetry of the histogram can reveal whether the data is skewed to the left or right, indicating potential biases in the data collection process [11].
  • Recognizing Multiple Modes: A histogram can show if the data has multiple peaks (modes), suggesting the presence of different subgroups within the dataset [12].

In this Dataset, we can use histograms to explore the distribution of variables such as:

  • Quantity (number of items purchased)
  • UnitPrice (price per item)
  • TotalPrice (total transaction value)

3.4.1 Basic Histogram

Let say we want to shows how many transactions occurred for each number of items purchased. Peaks (tall bars) indicate the most common purchase quantities.

hist(sales_data$Quantity,
     main = "Histogram of Quantity",
     xlab = "Number of Items Purchased",
     ylab = "Frequency",
     col = "skyblue",
     border = "white",
     breaks = 5)   

3.4.2 Histogram using ggplot2

library(ggplot2)

ggplot(sales_data, aes(x = Quantity)) +
  geom_histogram(
    bins = 5,                 # number of bins (adjust as needed)
    fill = "skyblue",         # fill color for the bars
    color = "white",          # border color for the bars
    alpha = 0.8               # transparency level
  ) +
  labs(
    title = "Histogram of Quantity",
    x = "Number of Items Purchased",
    y = "Frequency"
  ) +
  theme_minimal()

3.5 Pie Chart

3.6 Boxplot

3.7 Scatter Plot

References

[1]
GeeksforGeeks, Data types in statistics, 2025, Available. https://www.geeksforgeeks.org/maths/data-types-in-statistics/
[2]
Wilke, C. O., Fundamentals of data visualization, O’Reilly Media, 2019, Available. https://clauswilke.com/dataviz/
[3]
Tufte, E. R., The visual display of quantitative information, Graphics Press, 2001
[4]
Alooba, Line charts: Visualizing trends and patterns, 2023, Available. https://www.alooba.com/skills/concepts/data-visualization/line-charts/
[5]
BI, B., Spotting sales opportunities with line chart visualization, 2023, Available. https://www.boldbi.com/blog/spotting-sales-opportunities-with-line-chart-visualization/
[6]
Domo, Data visualization with period-over-period charts, 2023, Available. https://www.domo.com/learn/charts/period-over-period-charts
[7]
WebDataRocks, Best charts to show discrete data, WebDataRocks, 2022, Available. https://www.webdatarocks.com/blog/best-charts-discrete-data/
[8]
Codecademy, Visualizing categorical data: Bar charts and pie charts, Codecademy, 2023, Available. https://www.codecademy.com/learn/stats-visualizing-a-distribution-of-categorical-data/modules/stats-bar-charts-and-pie-charts/cheatsheet
[9]
Atlassian, A complete guide to bar charts, Atlassian, 2023, Available. https://www.atlassian.com/data/charts/bar-chart-complete-guide
[10]
GeeksforGeeks, Histogram - definition, types, graph, and examples, 2025, Available. https://www.geeksforgeeks.org/maths/histogram/
[11]
JMP, Histogram - introduction to statistics, 2023, Available. https://www.jmp.com/en/statistics-knowledge-portal/exploratory-data-analysis/histogram
[12]
Tableau, Understanding and using histograms, 2023, Available. https://www.tableau.com/chart/what-is-a-histogram