C Data science project


This appendix sketches the purpose and some criteria for an interesting and engaging data science project. In this introduction, we will briefly explain why such a project is an important part of this course and what constitutes a successful data science project.


As with any course requirement, most students initially experience the need for a DS-project as a hassle and a nuisance.75 This is unfortunate, as asking you to do a project actually is not meant as a torment, but rather as an opportunity to verify and demonstrate the practical utility of newly-aquired skills. Perhaps it helps to think of this project as a chance for showing what you have learned and now can do with data? When considering possible projects, simply make sure that you remind yourself how your questions and solutions would have differed if you had not been working through this book and course: Would you have been able to analyze the same data, obtain the same results, or submit the same report? If your skills for asking or answering questions have improved, you have learned something and can be proud of it.


What characterizes a good data science project? As the quality of any data exploration and analysis depends on (a) the quality of the data, (b) the questions asked, and (c) the analysis performed, it is difficult to establish a fixed set of rules for evaluating data science projects.

Here are some basic steps that may provide some initial orientation:

  • Look for an interesting question that can be addressed by data, or for a dataset that seems genuinely interesting to you.

  • Apply and show what you have learned (e.g., in the various chapters and parts of this book and course).

  • What has this analysis discovered or shown that we did not know before? What would we want to know next?

In short, ask an interesting question and find some data that can be analyzed, transformed, and visualized to address — and ideally answer — it.

  1. The same holds true for the weekly exercises. But ask yourself: Would you really have mastered the materials of this book without doing the exercises?↩︎