3.3 What might IMDb code numbers mean?
Every film has been given a code by IMDb, so merging their datasets is straightforward. As the codes are all of the form ‘tt’ followed by a number, those numbers were plotted against year to see if there were any patterns. The resulting scatterplot is shown in Figure 3.5.
It looks as though most numbers were allocated in order of production year, but that some films were added to the database later. This is particularly true of the very early films. People might have thought that the Lumière brothers’ films of 1895 were the first, but there are 67 recorded earlier, and it appears, judging by the numbering, these were initially not in the database. The earliest film listed is “Passage de Venus” (Transit of Venus) from 1874, a few seconds recorded using a photographic revolver. Several early films are by the extraordinary Eadweard Muybridge, the man who showed that all four hooves of a galloping horse are off the ground at the same time.
As Figure 3.5 only includes films and shorts that received at least 100 votes, it may be misleading. The same plot was drawn for the over 1.25 million items with any ratings at all. This suggested that there might be informative features for low code numbers. Figure 3.6 limits code numbers to less than 1,000,000 and facets by type of item. The scatterplots imply that a set of code numbers between 500,000 and 750,000 was reserved for TV episodes. The TV categories predominantly start in the 1950s and the video category in the 1960s. Quite why the major categories appear to have a tail to the left is unclear. All this may have little importance but is an example of what graphics may uncover that would be difficult to find in other ways. There is an advantage in having something to talk about with domain experts, and it may lead to other, more pertinent, information emerging.