3.1 Trailer
Many people use IMDb ratings to help them decide what they would like to watch. The ratings are contributed by users of the site. This chapter studies the data on films for which ratings are available. There is a dataset of 58,788 films scraped from IMDb several years ago that is available in the R package ggplot2movies. Nowadays IMDb makes regularly updated datasets available for download from their website. The one used here was downloaded in July 2022. There were 9,033,256 items and 1,251,317 had user ratings or votes. Excluding items labelled TV or video and restricting the data to movies and shorts that had more than 100 ratings left 124,667 films.
The film runtimes are shown in Figure 3.1 in a boxplot—or rather a few outliers are visible. The film lasting over 5 weeks (over 50,000 minutes), is called “Logistics” and records in reverse the journey of a pedometer from its production in China to its sale in Sweden. To look at runtimes in more detail, the 682 films longer than 3 hours were excluded. Figure 3.2 shows a histogram drawn with a binwidth of 1 minute, the level of resolution of the data.
Many films are recorded as having runtimes of 90 minutes and others runtimes of values of multiples of 10 or, to a lesser extent, 5, a form of data heaping. The most common runtime for short films is 7 minutes. IMDb provides information on technical specifications of the various releases of films. The famous Japanese film “The Seven Samurai” is reported as having eight different runtimes ranging from 150 minutes to 207 minutes. The highest value was the one supplied in the dataset.