2 Introduction
The purpose of this Data Science project is to predict the profits made by a movie based on the budget allocation. The motivation to work on that topic is first due to our firm interest of the Film industry; even if we do not look at the same movie genres, we both spend some of our spare time to enrich our knowledge in this area. Movies are a way of expressing oneself, sending messages, and showing what is happening worldwide.
Besides our interests, we thought that this project would be a great way to bring together our hobby and the practical business aspects. We both study Management (specialized in Business Analytics), so we wanted to work on something business-related. It is difficult to find relevant and free available data on the web; however, we could find some great datasets about movie budgets, profits, genres & characters. Trying to predict profits based on initial budget is something crucial that can be replicable in other business contexts.
Moreover, thanks to the many aspects of movies (different variables such as genres, the main characters…) we could deepen our analysis to see whether there were other correlations, other variables that impacted the relation. Of course, it is quite straight-forward to think about a positive relation between budget size & profits, but we wanted to see if this relation was significant.
Our research questions are the following:
Do budgets positively correlate with profits?
Could we predict the profits a movie will make based on its budget allocation?
Does this prediction change regarding the movie genre?
Does this prediction change when looking at the gender type of the main characters?
Could we spot some differences in the budget effect based on the production studio?
Which other aspects could increase the chances of a profitable movie? (for production studios that do not have big budgets)
After explaining our datasets (what variable they contain, the number of observations, how we linked different tables), we will start with the exploratory data analysis. The purpose is to have an overview of our data, to understand global correlations, numbers, so that we can get some initial insights. Then, we will dive into the core analysis part, so that we can answer our research questions. We will mostly use statistical models to represent these relationships and analyze the effects when other variables are considered before concluding with our final analysis.