第 1 章 Introduction

1.1 Data Science v.s. Data Engineering1

Data science

One job of a data scientist is asking the right questions on any given dataset (whether large or small).

  • Analyze data

  • Communicate with audience

Data engineering

The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via an API (Application Program Interface) to a data scientist who can easily query it.

1.2 What should a full-scope data scientist know?

  1. Statistics / Mathematics:
  • 統計相關科目
  1. Data engineering (a little bit)

    • Service: what kinds of service to be generated from data?

    • Front end user interface: who will use your service? Where are the data from?

    • Data storage

  2. Understand humanity…(this is where economics kicks in.)
    The data we will focus on are human-related data. Therefore, any question we try to answer is human-related.
  • How to model decisions? What are the important predictors?
  • Causality

1.3 Data Service Development

Service developer:

  • Data scientist

  • Data engineer

Users

Cloud:

  • Remote servers that can be used to store data, provide computation power, or provide service etc.

1.4 Goal of this course

  1. Preliminary data analysis using R.

  2. Invent new idea (project) regarding programming and cloud services to solve life problems.

    • Developer side (i.e. us): R, javascript, etc.

    • Cloud side: how to find and use proper API/SDK in R or javacript to harness cloud services.

  3. Learn to accomplish projects as a team.

knitr (???)bookdown (Xie 2018)

Reference

Xie, Yihui. 2018. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.