Chapter 1 What and why?

i2ds

This chapter introduces some key concepts of data science and provides an overview over the contents of this book — not primarily in terms of technology and tools (e.g., R, RStudio, R packages), but in terms of underlying key concepts (e.g., data, science, and the relation between data science and related disciplines). The only tool advocated in this chapter is R Markdown, which allows merging text and code in an effort to conduct reproducible research.

Key concepts and issues

Issues and questions addressed in this chapter:

  • Basic terminology:
    • What is data (types and shapes)?
    • What is science?
    • What is data science?
  • Which skills?
    • Relation to statistics?
    • Relation to computer programming?
  • Which tools?
    • The ecological rationality of tools
    • Using R Markdown for reproducible research

Important concepts introduced in this chapter include the terms representation and ecological rationality.

Preflections

Before you read on, please take some time to reflect upon the following questions:

i2ds: Preflexions

  • Try defining the term data. How does it relate to information?

  • What is the difference between variables and values?

  • What is the difference between science and data science?

  • Which skills does a data scientist need?

  • What are characteristics of a useful tool?

  • Which tools are you currently using for reading, writing, or calculating? Why these and not others?

  • What is reproducible research?

  • What kind of tool(s) would we want to adhere to its principles?

Please take some notes on your answers. After finishing this chapter, Exercise 1.5.5 will ask you to summarize them in an R Markdown document.

References

Baumer, B. S., Kaplan, D. T., & Horton, N. J. (2021). Modern Data Science with R (2nd ed.). Chapman; Hall/CRC. https://mdsr-book.github.io/mdsr2e/
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., & others. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4, 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. https://doi.org/10.1080/10618600.2017.1384734