the artistry and psychology of gaming


The Dangers of a Narrow Score Spectrum

The Dangers of a Narrow Score Spectrum

In order to examine the dangers of a narrow score spectrum, let’s first consider why we give games numeric scores in the first place. I’d suggest that they’re for two reasons. One is as a simple channel of communication from reviewer to reader that allows the reader to get a very rudimentary view of the reviewer’s opinion in one single character. I’ll talk in future entries about why that’s a bad idea, but you can probably think of a few reasons yourself. The second one, and the one that I’ll focus on this week, is differentiation and comparison: scores give us a way to directly compare games to one another, allowing ranking systems, voting systems, etc. You can’t easily compare games without some kind of formal, objective scale to place them on, and so, the scoring system exists to allow us to differentiate games from one another easily.

What, then, happens when we focus on putting all our games in a small subset of our reviewing spectrum? We lose the power to differentiate between them. Take a simple 1 to 10 scoring scale, and assume that, like the current industry, we focus only on points 7 through 10. That effectively gives us four bins to put every game in the world into: 10, 9, 8, and 7. How do you sort games into such large bins? Inevitably, you’re going to end up with games of vastly different qualities sharing bins solely because there aren’t a lot of bins to choose from. It’s like trying to sort every food in the world into four overarching groups; inevitably, you’re going to end up with mismatches as bizarre as kiwis and pizza in the same group solely because you have so few options. If the goal for review scores is to facilitate differentiation and comparison, then this sort of weak granularity is incredibly detrimental.

In reality, however, differentiation rarely comes up as a practical concern because the industry at least seems to recognize it as an issue and takes counter-measures. whether knowingly or unknowingly. Unfortunately, those counter-measures tend to be to add on new solutions to the spectrum rather than fix the problem to begin with. Differentiation doesn’t become an issue because review outlets seem very content to add new levels of granularity to the upper echelons of the review system. There was a time when games were reviewed on a ten-point scale, 1 through 10. Over time, that evolved into a 20-point scale, 1 through 10 with half-points in between. Now, it’s become tenths of points in most places, which is what allows a score like 8.8 to even arise in the first place.

On the surface, this isn’t a bad thing — after all, if differentiation, as I’ve proposed, is one of the chief reasons why a numeric grading scheme even exists, then enhanced differentiation only adds to that. The fewer tie scores, the better. The problem, however, is that the increased granularity of the scoring scale isn’t coming in response to a demand for increased differentiation across the board; it’s coming in response to a demand for increased differentiation after a substantial amount of score inflation.  After years and years of “average” scores creeping higher and higher, the industry comes to a point where basically every game gets an 8 or a 9. That introduces the need for increased granularity, but only at those higher levels. So, we get an effective scale of 8.0 to 9.9, 20 points. But wait, didn’t we just have a 20-point scale anyway? We did, but it wasn’t getting used. So, we transition from one 20-point scale to another.

A common counter-argument to this entire line of reasoning is that the scoring scale doesn’t matter as long as the reader understands what the reviewer is saying; if 8.8 is mediocre, that’s fine as long as everyone knows 8.8 is mediocre. I don’t disagree at all; one of the major reasons a scoring system exists is to form this understanding between the reviewer and the reader. The problem arises with this inflation; inflation consistently moves the scoring scale and changes it relative to the actual opinions of the game, and that consistent change threatens the communication between the reviewer and the reader. An 8.8 now doesn’t represent what an 8.8 represented 10 years ago, and it probably doesn’t represent what it will represent 10 years from now, thanks to this inflation.

It becomes a vicious cycle of inflation and re-differentiation. We start with a 1 to 10 scoring scale, but for whatever reason, average scores inflate and we begin giving all games scores within a specific range on that scale. But then, are scale lacks the power to differentiate between games, so we introduce more specific granularity. But then, within that more specific granularity, we again begin focusing on a particular subset of scores, creating the new for further differentiation again. It’s a natural process; we assume that anything in the bottom 70% of our current effective scale is subpar, and as a result, we score everything in the top 30%. But as a result of that, we need further differentiation of that top 30%, which in turn creates a bottom 70% within that top 30%.

This inflation isn’t automatic or inherent; there are reviewing outlets for other mediums that have existed far longer than the reviews for games. Take movie reviews, for instance. All movie review outlets that I’m familiar with rate movies on a 1 to 5 star scale, with half stars being an option. Giving a movie 4 stars is legitimately considered a good score, as it should be — it won’t win any best picture nominations, but it’s still good. The equivalent for a game would be an 8, and as we’ve seen, scores as low as 8 are reserved for below-average games.

All this about scoring systems echoes the underlying need that I mentioned right off the bat: the need to compare and differentiate games. But a scoring system inherently suggests one major thing: that all games can be compared on one system. But is that really even true? How do you compare  a game like Uncharted to a game like Angry Birds? How do you compare Super Mario Bros. to Super Mario Galaxy? If the goal of a scoring system is to establish a formal spectrum on which games can be scored for comparison, it has to answer these questions. But is an answer even possible? That will be our focus next week.


  1. X-Play explicitly mentioned this as one of the reasons that they use a five-star scoring method. Or at least why they used that scale back when they actually reviewed games.

  2. I both agree and disagree with what you’re saying. The current rating system is an absolute joke, as is the integrity of most reviewers. It’s just like movies, where the big name films get great reviews, and the lesser-known titles are disregarded, regardless of quality.

    However, I wouldn’t agree with dissolving numerical ratings altogether. I think that a numerical scale that has been adhered to is a good thing. Different types of games are just that; different. Because of this, I believe that ever reviewer and person reading said review should keep the game’s context in mind whether that means level of technology, gaming generation, genre, or what have you. For example, you couldn’t possibly compare the first Legend of Zelda with something like House of the Dead, but anyone reading the review in question shouldn’t go into the review attempting to make such a suggestion. If I were to read a positive article about a Fire Emblem game, I still wouldn’t be convinced to play it, because I don’t like SRPGs, so the game’s completely different within the context of my own tastes.

    Of course, a review shouldn’t JUST be a number; there should be a great deal of justification and substance within every review. You can’t just have something like “10/10! I loevz teh Marioz! LOLOLOL!!!!111” because it doesn’t tell me why I should love the game; just that the reviewer does. Those who would take a large review and only read the numerical score, ignoring the review itself likely wouldn’t read a review of that length, whether or not the score is there.

Leave a Comment

Your email address will not be published. Required fields are marked *