Chapter 1 What and why?
This chapter introduces some key concepts of data science and provides an overview over the contents of this book — not primarily in terms of technology and tools (e.g., R, RStudio, R packages), but in terms of underlying key concepts (e.g., data, science, and the relation between data science and related disciplines). The only tool advocated in this chapter is R Markdown, which allows merging text and code in an effort to conduct reproducible research.
Key concepts and issues
Issues and questions addressed in this chapter:
- Basic terminology:
- What is data (types and shapes)?
- What is science?
- What is data science?
- Which skills?
- Relation to statistics?
- Relation to computer programming?
- Which tools?
- The ecological rationality of tools
- Using R Markdown for reproducible research
Important concepts introduced in this chapter include the terms representation and ecological rationality.
Recommended readings for this chapter include:
Before you read on, please take some time to reflect upon the following questions:
Try defining the term data. How does it relate to information?
What is the difference between variables and values?
What is the difference between science and data science?
Which skills does a data scientist need?
What are characteristics of a useful tool?
Which tools are you currently using for reading, writing, or calculating? Why these and not others?
What is reproducible research?
What kind of tool(s) would we want to adhere to its principles?