4 Data Collection
Data collection (Data gathering) is the process of systematically gathering, measuring, and analyzing data to extract insights, train machine learning models, or support data-driven decision-making. It is a fundamental step in Data Science Programming, ensuring the availability of high-quality, relevant data for analysis. Data collection methods are broadly categorized into Primary Data Collection, where data is obtained firsthand, and Secondary Data Collection, where existing datasets are used for analysis.
4.1 Primary Data Collection
Primary data collection involves gathering firsthand data directly from sources for analysis in data science projects. It ensures high relevance and accuracy but may require significant time and resources.
Type | Description | Examples |
---|---|---|
Qualitative | Non-numerical data used to understand behaviors, opinions, and experiences. | User Interviews, Social Media Sentiment Analysis, Observations, Case Studies |
Quantitative | Numerical data used to measure patterns, trends, and relationships. | Surveys, A/B Testing, IoT Sensor Data, Web Scraping, API Data Extraction |
Qualitative methods are valuable in data science for analyzing unstructured data like text, images, and human interactions. Quantitative methods provide structured, numerical data, essential for building predictive models, statistical analysis, and AI-driven insights.
Choosing the right method depends on the data science goal—qualitative for understanding context and quantitative for model-driven decision-making.
4.1.1 Web Scraping Data with Py
Web scraping is the process of extracting data from websites using automated scripts. Python is one of the most popular languages for web scraping due to its powerful libraries and ease of use.
Why Use Python for Web Scraping?
- User-friendly: Python has simple syntax, making scraping easier.
- Powerful libraries: BeautifulSoup, Requests, and Selenium allow efficient data extraction.
- Automation capabilities: Scraping can be scheduled to collect data regularly.
Common Web Scraping Techniques in Python
- Using
BeautifulSoup
– Best for extracting data from static web pages. - Using
Requests
– Ideal for sending HTTP requests to fetch web content. - Using
Selenium
– Useful for scraping JavaScript-rendered websites.
Before scraping, always check the website’s robots.txt file to ensure compliance with its policies.
4.1.2 Web Scraping Data with R
Web scraping is the process of extracting data from websites. In R, web scraping can be done using popular packages such as rvest, httr, and RSelenium. These tools allow users to collect, process, and analyze web data efficiently.
Why Use R for Web Scraping?
- Easy to use: Packages like rvest simplify HTML parsing.
- Powerful data manipulation: R has excellent support for data cleaning and analysis.
- Automating tasks: Scraping can be automated for large-scale data collection.
Common Web Scraping Methods in R
- Using
rvest
– Best for simple static web pages. - Using
httr
– Useful for sending HTTP requests. - Using
RSelenium
– Required for scraping JavaScript-heavy websites.
Notes: Before scraping, always check a website’s robots.txt file to ensure compliance with its policies.
4.2 Secondary Data Collection
Secondary data collection involves using existing datasets instead of gathering new data. It is widely used in data science for historical analysis, benchmarking, and training machine learning models. These data sources come from public records, business reports, and online databases.
Type | Description | Examples |
---|---|---|
Public Data | Government and institutional datasets used for large-scale analysis. | Census Data, Open Government Data, Research Reports |
Business Data | Company-generated reports and analytics used for industry insights. | Financial Reports, Market Research, Customer Data |
Online Data | Digital datasets available for academic and practical applications. | Scientific Databases (Google Scholar, ResearchGate), Open Data Portals (Kaggle, UCI ML Repository), Web Scraped Data |
Secondary data is cost-effective and time-efficient but may require preprocessing to ensure quality and relevance for data science applications. Choosing reliable and well-structured data sources is essential for accurate analysis.
4.2.1 Use Online Data
Read all data first before processing it. This makes the script more efficient because you avoid reading the Excel file multiple times in different loops.
4.2.2 Read All Sheets First
Instead of reading the Excel file multiple times inside loops, read all sheets first and store them in a list.
Python Code
import pandas as pd
# File path
= "data/bab4/Rekap_Kuesioner.xlsx"
path
# Read all sheets at once
= pd.read_excel(path, sheet_name=None) # Dictionary of DataFrames sheets
pd.read_excel(path, sheet_name=None)
reads all sheets into a dictionary where keys are sheet names and values are DataFrames. This avoids multiple file reads, making the process more efficient.
R Code
# Import libraries
library(readxl) # To read Excel files
# File path
<- "data/bab4/Rekap_Kuesioner.xlsx"
path
# Read all sheets at once
<- lapply(1:21, function(i) read_excel(path, sheet = i))
all_sheets all_sheets
[[1]]
# A tibble: 33 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP11… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Nirm… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen I KE… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 23 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[2]]
# A tibble: 34 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP11… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Gamb… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen WILD… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 24 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[3]]
# A tibble: 33 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP11… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Mate… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen OEMA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 23 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[4]]
# A tibble: 33 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP11… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Peng… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen DODY… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 23 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[5]]
# A tibble: 37 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP21… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Desa… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen HARR… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 27 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[6]]
# A tibble: 36 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP21… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Tekn… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen IYUS… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 26 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[7]]
# A tibble: 36 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP21… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Baha… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen I KE… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 26 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[8]]
# A tibble: 36 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP21… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Peng… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen HARR… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 26 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[9]]
# A tibble: 37 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP21… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Sema… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen ADEL… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 27 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[10]]
# A tibble: 38 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP31… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Desa… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen DODY… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 28 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[11]]
# A tibble: 34 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP31… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Ergo… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen DODY… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 24 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[12]]
# A tibble: 38 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP31… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Tinj… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen WILD… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 28 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[13]]
# A tibble: 38 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP31… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Meto… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen ADEL… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 28 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[14]]
# A tibble: 38 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP31… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Work… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen IYUS… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 28 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[15]]
# A tibble: 38 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP31… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Pemo… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen IYUS… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 28 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[16]]
# A tibble: 36 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP41… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Desa… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen ADEL… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 26 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[17]]
# A tibble: 37 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP41… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Kerj… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen OEMA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 27 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[18]]
# A tibble: 37 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP41… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Kolo… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen DODY… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 27 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[19]]
# A tibble: 37 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP41… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Mana… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen I KE… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 27 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[20]]
# A tibble: 37 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP41… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Peng… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen HARR… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 27 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
[[21]]
# A tibble: 34 × 13
`Hasil Kuisioner` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Semester : Gasal… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Sesi : Semua <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Kelas DP42… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Mata Kuliah Tuga… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Program Studi DESA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Dosen OEMA… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 No. Responden Kode… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# ℹ 24 more rows
# ℹ 2 more variables: ...12 <chr>, ...13 <chr>
lapply(1:21, function(i) read_excel(path, sheet = i))
reads all 21 sheets into a list called all_sheets. Now, all_sheets[[i]]
represents the data from the i-th sheet.
4.2.3 Extract Course Names
Now, we extract course names from the already loaded data.
Python Code
# Initialize course name list
= []
MataKuliah
# Loop through sheets
for i, (sheet_name, df) in enumerate(sheets.items(), start=1):
# Find row containing "Mata Kuliah"
= df[df.iloc[:, 0] == "Mata Kuliah"]
matakuliah_row
if not matakuliah_row.empty:
0, 1]) # Get column 2 value
MataKuliah.append(matakuliah_row.iloc[else:
f"Sheet{i}") # Default if not found
MataKuliah.append(
MataKuliah
['Nirmana I', 'Gambar I', 'Matematika Geometri', 'Pengantar Bidang Studi Desain Produk Industri', 'Desain Produk I', 'Teknik Presentasi I', 'Bahan dan Proses Produksi', 'Pengantar HAKI', 'Semantika Produk', 'Desain Produk III', 'Ergonomi Desain I', 'Tinjauan Desain', 'Metodologi Desain', 'Workshop Material', 'Pemodelan Digital I', 'Desain Produk V', 'Kerja Profesi', 'Kolokium Pra Tugas Akhir', 'Manajemen Pemasaran Produk', 'Pengantar Desain Sarana Lingkungan', 'Tugas Akhir', 'Sheet22', 'Sheet23']
Searches for the row where the first column is “Mata Kuliah”. Extracts the corresponding course name from the second column. If not found, assigns a default name like “Sheet1”, “Sheet2”, etc.
R Code
<- sapply(all_sheets, function(sheet) {
MataKuliah <- which(sheet[[1]] == "Mata Kuliah")
matakuliah_row if (length(matakuliah_row) > 0) {
return(as.character(sheet[matakuliah_row, 2][[1]])) # Extract course name
else {
} return(NA) # If not found, return NA
}
})
# Replace missing course names with default values
is.na(MataKuliah)] <- paste0("Sheet", which(is.na(MataKuliah)))
MataKuliah[ MataKuliah
[1] "Nirmana I"
[2] "Gambar I"
[3] "Matematika Geometri"
[4] "Pengantar Bidang Studi Desain Produk Industri"
[5] "Desain Produk I"
[6] "Teknik Presentasi I"
[7] "Bahan dan Proses Produksi"
[8] "Pengantar HAKI"
[9] "Semantika Produk"
[10] "Desain Produk III"
[11] "Ergonomi Desain I"
[12] "Tinjauan Desain"
[13] "Metodologi Desain"
[14] "Workshop Material"
[15] "Pemodelan Digital I"
[16] "Desain Produk V"
[17] "Kerja Profesi"
[18] "Kolokium Pra Tugas Akhir"
[19] "Manajemen Pemasaran Produk"
[20] "Pengantar Desain Sarana Lingkungan"
[21] "Tugas Akhir"
This avoids multiple read_excel() calls inside the loop. If Mata Kuliah
is found, it extracts the course name. Otherwise, it assigns SheetX
as a placeholder.
4.2.4 Extract Average
Data
Now, process the average scores from each sheet.
Python Code
# Define questions (Pertanyaan)
= ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L"]
Pertanyaan
# Define categories (Kategori)
= ["Reliability"] * 6 + ["Responsiveness"] + ["Assurance"] * 2 + ["Empathy"] + ["Tangible"] * 2
Kategori
# Create DataFrame with questions & categories
= pd.DataFrame({"Pertanyaan": Pertanyaan, "Kategori": Kategori})
Ganjil_23_24
# Extract "Rata-rata" values
for i, (sheet_name, df) in enumerate(sheets.items()):
= df[df.iloc[:, 0] == "Rata-rata"]
rata_rata_row
if not rata_rata_row.empty:
= rata_rata_row.iloc[0, 1:13].astype(float).round()
selected_data = selected_data.values
Ganjil_23_24[MataKuliah[i]] else:
= [None] * len(Pertanyaan)
Ganjil_23_24[MataKuliah[i]]
# Display the Final Table
print(Ganjil_23_24)
Pertanyaan Kategori Nirmana I ... Tugas Akhir Sheet22 Sheet23
0 A Reliability 4.0 ... 4.0 None None
1 B Reliability 4.0 ... 3.0 None None
2 C Reliability 3.0 ... 3.0 None None
3 D Reliability 4.0 ... 3.0 None None
4 E Reliability 4.0 ... 3.0 None None
5 F Reliability 4.0 ... 3.0 None None
6 G Responsiveness 4.0 ... 3.0 None None
7 H Assurance 4.0 ... 4.0 None None
8 I Assurance 4.0 ... 4.0 None None
9 J Empathy 4.0 ... 3.0 None None
10 K Tangible 2.0 ... 3.0 None None
11 L Tangible 3.0 ... 3.0 None None
[12 rows x 25 columns]
R Code
# First column (questions)
<- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L")
Pertanyaan
# Category assignments
<- c(rep("Reliability", 6),
Kategori "Responsiveness",
rep("Assurance", 2),
"Empathy",
rep("Tangible", 2))
# Create a data frame with questions & categories
<- data.frame(Pertanyaan = Pertanyaan, Kategori = Kategori)
Ganjil_23_24
# Loop through sheets to extract "Rata-rata" values
for (i in 1:21) {
<- which(all_sheets[[i]][[1]] == "Rata-rata")
rata_rata_row
if (length(rata_rata_row) > 0) {
<- all_sheets[[i]][rata_rata_row, 2:13] # Extract 12 columns
selected_data <- round(as.numeric(selected_data)) # Convert
Ganjil_23_24[[MataKuliah[i]]] else {
} <- NA # Assign NA if "Rata-rata" not found
Ganjil_23_24[[MataKuliah[i]]]
}
}
# Display the Final Table
library(knitr) # To create formatted tables
kable(Ganjil_23_24)
Pertanyaan | Kategori | Nirmana I | Gambar I | Matematika Geometri | Pengantar Bidang Studi Desain Produk Industri | Desain Produk I | Teknik Presentasi I | Bahan dan Proses Produksi | Pengantar HAKI | Semantika Produk | Desain Produk III | Ergonomi Desain I | Tinjauan Desain | Metodologi Desain | Workshop Material | Pemodelan Digital I | Desain Produk V | Kerja Profesi | Kolokium Pra Tugas Akhir | Manajemen Pemasaran Produk | Pengantar Desain Sarana Lingkungan | Tugas Akhir |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | Reliability | 4 | 3 | 4 | 4 | 4 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 |
B | Reliability | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
C | Reliability | 3 | 4 | 4 | 4 | 3 | 4 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
D | Reliability | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
E | Reliability | 4 | 4 | 4 | 4 | 3 | 4 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
F | Reliability | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
G | Responsiveness | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
H | Assurance | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 | 4 | 4 | 3 | 4 | 3 | 3 | 3 | 4 |
I | Assurance | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 |
J | Empathy | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
K | Tangible | 2 | 3 | 3 | 3 | 3 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
L | Tangible | 3 | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
Instead of reading the Excel file multiple times, we use all_sheets[[i]]
to get the necessary data. This makes the code more efficient because the Excel file is only read once.