1.2 The Task
Imagine that you are tasked with producing a ‘summary score’ of the beers that captures each beer’s essential properties, but omits all the redundant details. In fact, we really want to find the best possible summary score of these data. What do we mean by best? We might conceive of two different aims:
First, the summary score should be able to discriminate between two beers that are essentially different to one another. Imagine if part of our summary score was based on the property of ‘contains alcohol’. This would be a useless property to consider, since the vast majority of beers contain alcohol. But if we think about the amount of alcohol a beer contains, this might be useful, as there is a lot of variation in the amount beers can contain. In other words, we are looking for properties of the beer to go into our summary score that maximise variance, i.e. that show a lot of spread in their values.
Second, the summary score should be able to accurately ‘reconstruct’ the original beer list. Imagine if part of our summary score was based on some property actually unrelated to the beer, like ‘label colour’. This would likely not be a useful property, since the colour of the beer label is not likely to actually tell you anything about the essential characteristics of the beer. Again, a useful characteristic might be amount of alcohol, since the amount of alcohol is actually very likely to tell you a lot about the other facets of the beer - a Budweiser and an imperial stout are different from each other not only by alcohol content, but also by colour, style, and probably drinker enjoyment. In other words, we are looking for properties to go into our summary score that minimise error, i.e. that are very representative of the other essential characteristics of the beer.
So overall, a good summary score ‘maximises variance (across beers)’ and ‘minimises error (when reconstructing the list)’. Luckily, mathematically, these two aims are exactly the same.