The Dangers of a Narrow Score Spectrum
In order to examine the dangers of a narrow score spectrum, let’s first consider why we give games numeric scores in the first place. I’d suggest that they’re for two reasons. One is as a simple channel of communication from reviewer to reader that allows the reader to get a very rudimentary view of the reviewer’s opinion in one single character. I’ll talk in future entries about why that’s a bad idea, but you can probably think of a few reasons yourself. The second one, and the one that I’ll focus on this week, is differentiation and comparison: scores give us a way to directly compare games to one another, allowing ranking systems, voting systems, etc. You can’t easily compare games without some kind of formal, objective scale to place them on, and so, the scoring system exists to allow us to differentiate games from one another easily.
What, then, happens when we focus on putting all our games in a small subset of our reviewing spectrum? We lose the power to differentiate between them. Take a simple 1 to 10 scoring scale, and assume that, like the current industry, we focus only on points 7 through 10. That effectively gives us four bins to put every game in the world into: 10, 9, 8, and 7. How do you sort games into such large bins? Inevitably, you’re going to end up with games of vastly different qualities sharing bins solely because there aren’t a lot of bins to choose from. It’s like trying to sort every food in the world into four overarching groups; inevitably, you’re going to end up with mismatches as bizarre as kiwis and pizza in the same group solely because you have so few options. If the goal for review scores is to facilitate differentiation and comparison, then this sort of weak granularity is incredibly detrimental.
In reality, however, differentiation rarely comes up as a practical concern because the industry at least seems to recognize it as an issue and takes counter-measures. whether knowingly or unknowingly. Unfortunately, those counter-measures tend to be to add on new solutions to the spectrum rather than fix the problem to begin with. Differentiation doesn’t become an issue because review outlets seem very content to add new levels of granularity to the upper echelons of the review system. There was a time when games were reviewed on a ten-point scale, 1 through 10. Over time, that evolved into a 20-point scale, 1 through 10 with half-points in between. Now, it’s become tenths of points in most places, which is what allows a score like 8.8 to even arise in the first place.
On the surface, this isn’t a bad thing — after all, if differentiation, as I’ve proposed, is one of the chief reasons why a numeric grading scheme even exists, then enhanced differentiation only adds to that. The fewer tie scores, the better. The problem, however, is that the increased granularity of the scoring scale isn’t coming in response to a demand for increased differentiation across the board; it’s coming in response to a demand for increased differentiation after a substantial amount of score inflation. After years and years of “average” scores creeping higher and higher, the industry comes to a point where basically every game gets an 8 or a 9. That introduces the need for increased granularity, but only at those higher levels. So, we get an effective scale of 8.0 to 9.9, 20 points. But wait, didn’t we just have a 20-point scale anyway? We did, but it wasn’t getting used. So, we transition from one 20-point scale to another.
A common counter-argument to this entire line of reasoning is that the scoring scale doesn’t matter as long as the reader understands what the reviewer is saying; if 8.8 is mediocre, that’s fine as long as everyone knows 8.8 is mediocre. I don’t disagree at all; one of the major reasons a scoring system exists is to form this understanding between the reviewer and the reader. The problem arises with this inflation; inflation consistently moves the scoring scale and changes it relative to the actual opinions of the game, and that consistent change threatens the communication between the reviewer and the reader. An 8.8 now doesn’t represent what an 8.8 represented 10 years ago, and it probably doesn’t represent what it will represent 10 years from now, thanks to this inflation.
It becomes a vicious cycle of inflation and re-differentiation. We start with a 1 to 10 scoring scale, but for whatever reason, average scores inflate and we begin giving all games scores within a specific range on that scale. But then, are scale lacks the power to differentiate between games, so we introduce more specific granularity. But then, within that more specific granularity, we again begin focusing on a particular subset of scores, creating the new for further differentiation again. It’s a natural process; we assume that anything in the bottom 70% of our current effective scale is subpar, and as a result, we score everything in the top 30%. But as a result of that, we need further differentiation of that top 30%, which in turn creates a bottom 70% within that top 30%.
This inflation isn’t automatic or inherent; there are reviewing outlets for other mediums that have existed far longer than the reviews for games. Take movie reviews, for instance. All movie review outlets that I’m familiar with rate movies on a 1 to 5 star scale, with half stars being an option. Giving a movie 4 stars is legitimately considered a good score, as it should be — it won’t win any best picture nominations, but it’s still good. The equivalent for a game would be an 8, and as we’ve seen, scores as low as 8 are reserved for below-average games.
All this about scoring systems echoes the underlying need that I mentioned right off the bat: the need to compare and differentiate games. But a scoring system inherently suggests one major thing: that all games can be compared on one system. But is that really even true? How do you compare a game like Uncharted to a game like Angry Birds? How do you compare Super Mario Bros. to Super Mario Galaxy? If the goal of a scoring system is to establish a formal spectrum on which games can be scored for comparison, it has to answer these questions. But is an answer even possible? That will be our focus next week.