第 1 章 Introduction
1.1 Data Science v.s. Data Engineering1
Data science
One job of a data scientist is asking the right questions on any given dataset (whether large or small).
Analyze data
Communicate with audience
Data engineering
The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via an API (Application Program Interface) to a data scientist who can easily query it.
1.2 What should a full-scope data scientist know?
- Statistics / Mathematics:
- 統計相關科目
Data engineering (a little bit)
Service: what kinds of service to be generated from data?
Front end user interface: who will use your service? Where are the data from?
Data storage
- Understand humanity…(this is where economics kicks in.)
The data we will focus on are human-related data. Therefore, any question we try to answer is human-related.
- How to model decisions? What are the important predictors?
- Causality
1.3 Data Service Development
Service developer:
Data scientist
Data engineer
Users
Cloud:
- Remote servers that can be used to store data, provide computation power, or provide service etc.
1.4 Goal of this course
Preliminary data analysis using R.
Invent new idea (project) regarding programming and cloud services to solve life problems.
Developer side (i.e. us): R, javascript, etc.
Cloud side: how to find and use proper API/SDK in R or javacript to harness cloud services.
Learn to accomplish projects as a team.
knitr (???) 和 bookdown (Xie 2018)。
Reference
Xie, Yihui. 2018. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.