3.1 Kaggle Competion

Kaggle (https://www.kaggle.com/), a subsidiary of Google LLC, is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals. Kaggle was founded in 2010 with the idea that data scientists need a place to come together and collaborate on projects, learning new techniques and share each others experience. This has transformed into a network with more than 1,000,000 registered users, and has created a safe place for data science learning, sharing, and competition.

Using the human competitive spirit, Kaggle created a platform for organizations to host competitions which have fueled new methodology and techniques in data science, and given organizations new insights from the data they provided.

Figure 3.2: Kaggle Competition web site

Generally, each competition has a host, and each host has to prepare and provide data. When providing data, the host has the opportunity to give additional information such as a description, evaluation method, timeline, and prize for winning. Although this may not be an ideal real world data problem, which data scientist may face in the business. But it provides a good starting point for learners. In a real world, you may need to start from understand the business and find data sources by your self. Although competition host has provided data. You cannot assume the data provided are clean data and ready for analysis. Cleaning and preprocess data are part of the competition. Therefore, any solution can be tested to see how good a participant is with the whole process of data science project.