Weekly WaveShine #3

Rating and Ranking in Metagame Systems, Cleaning Up My Unreal Project

Hello, and thanks for stopping by for the this week’s entry in my Weekly WaveShine series! In this week’s entry you can look forward to:

  1. My reading and a summary of my notes on rating systems, ranking systems, and tournament systems from Game Balance [Go To]
  2. An update on my side project in Unreal Engine [Go To]
Do you want to see me clumsily apply statistical techniques to Hogwarts Legacy? Or do you just want to read more words written by me? Check out last week’s blog entry or my most recent entry!

What I’m Reading: Ratings, Rankings, and Tournament Systems

This week, I was able to finish reading and taking notes on Chapter 14 of Game Balance by Ian Schreiber and Brenda Romero. In particular, I covered:

  • Common issues encountered by rating systems
  • Some statistics about outcome prediction and analyzing rating systems
  • Different types of ranking systems and factors to consider
  • Common decisions to be made when designing tournament formats
This chapter brought back memories of all of the Super Smash Bros. Melee tournaments I’ve been to. I’d actually participated in a few of the tournament formats mentioned, and it was nice remembering all the fun times! Also, for those that don’t know, a rating system attempts to numerically quantify a player’s skill. A well known example is Elo rating. A ranking system attempts to ordinally categorize and sort players according to skill. A lot of games use these in their competitive modes, ranking players Bronze, Silver, Gold, etc. A tournament system attempts to make top-level players complain as quickly as possible. Sometimes they get used to directly determine the best player from a group of participants.

Tournament Melee: Big House 5

Some footage from a tournament series I’ve played in. Important to note that I am not in this clip and that both of these players are much better than me.

Common Issues in Rating Systems

This section covered a few of the issues frequently encountered when designing a rating system. The most prominent of these issues is that somewhat depressingly, there will always be players that seek to exploit the system and gain an unfair or unintended advantage over their opponents. Briefly, the issues covered included:

  1. Encouraging Activity – Across multiple rating systems, highly rated players may not want to play against lower rated players since the risk of lost rating is greater for the higher-rated player. Common ways to circumvent this is to directly implement time-based rating decay or to employ external incentives like in-game currency or daily quests to drive participation.
  2. Luck vs. Skill – A lot of rating systems are built on the assumption that the better player always wins, which is not a valid assumption for most games. If the system doesn’t compensate for it, then higher rated players are actually penalized as they will lose slightly more often than expected and their average opponent is going to be worse than them, resulting in a larger overall penalty to their rating. To compensate, can mathematically approximate the degree to which luck affects the outcome and adjust the awarded/lost amount of rating accordingly.
  3. Matchmaking – Rating systems become especially susceptible to exploitation and undesirable trends if players are able to earn rating based on their own matchmaking. The solution in most cases is to implement automated matchmaking. These solutions often have to balance finding the ideal match against the length of the player’s queue time.
  4. Rating Inflation/Deflation – Over time, it is possible for the average rating of players to increase or decrease within the system. This inflation/deflation of ratings can produce some undesirable trends over time since ratings become meaningful when contextualized against other players. Rating inflation generally hurts newer players as they need to gain greater amounts of rating points to catch up with long-time players. Rating deflation unsurprisingly harms more experienced players. The simple solution is to create a system that is zero sum, ensuing there can be no net gain or loss within the system. Often times, this is not possible, and a lot of games may periodically reset their ratings to compensate for this.
  5. Draws – In some games, a draw between players means they are very evenly matched, and in other games, a draw is very common. This relationship should be considered within the game and used to determine how to handle draws in terms of ratings. For example, for a very lowly rated player to be able to force a highly rated player into a draw while playing chess says a lot more than forcing a draw in Tic-Tac-Toe.
  6. Multiplayer Games – A lot of games will feature cases that are more complex than 1 vs. 1. In these scenarios, rating systems will often need to account for relative placement in free for all game modes, teammate ratings, and individual contributions to the outcome of the game.
  7. Disconnection – With online games, it can be difficult to tell if a disconnection event is intentional, and how the system handles this case will incentivize different behaviors. For example, if disconnection results in a loss, then players may attempt to DDOS other players. If it is counted as a draw, players may quit before losing to avoid losing rating points. Furthermore, in team games, if even one player disconnects, then for all other players involved, it becomes much more difficult to determine how to interpret player performance up until that point, whether the game should continue, and how those results should affect their rating. This is also an issue of accessibility, as some players may simply not be able to afford stable internet or may not live in places that have stable internet which raises the question of whether these factors should affect a player’s ability to compete.
  8. Cheating – It’s gonna happen, and as a competitor, it makes me very sad. That being said, most good rating systems will need to directly design against different ways players may attempt to cheat or exploit the system. In most cases, directly resolving cases by hand is not practical, and discovery of unhandled/unresolved cheating will destroy confidence in the competitive validity of your game’s rating system. Keep an eye out for positive sum sources of points and for ways players can collude to manipulate outcomes.

Ratings and the Statistics of Outcome Prediction

In a perfect rating system, then all players at a particular skill level would have a (nearly) constant rating, assuming there aren’t any major changes in their level of skill. In such a scenario, then for a 1 v 1 game, this rating system would predict that a player who would gain X points for winning a game against a specific player and lose Y points for losing to that same player would have a probability of winning such that P(win) = Y / (X + Y). What’s interesting to note is that when Y = 0, the player has 0 probability of winning and will correspondingly lose 0 points for doing so. Similarly, when X = 0, the player has a 1 probability of winning, meaning that their victory is guaranteed and that they will be awarded 0 points for winning.

While rating systems may hold up as win/loss predictors on paper, the outcome of most games is affected by other sources like:

  1. Luck – Again, if the outcome of a game is partially decided by luck, then a rating system will break down unless it intentionally factors for it.
  2. Matchups and other pre-match decisions – Sometimes, games can be largely decided based on character/hero choice before the game even begins, even if one of the players is significantly better than the other.
  3. In team games, win/loss can be affected by decisions made by other teammates – As mentioned above, depending on players’ individual performances, the outcome may vary widely.
  4. Familiarity/Synergy between teammates in team games – Teammates that are familiar with one another may perform more highly than an equally skilled (and rated) team.
  5. Expected point spreads – A basic rating system only predicts win and loss, but it may actually be desirable to predict the actual point spread of winning or losing, if relevant to the game.
  6. Sandbagging – Most basic rating systems aren’t able to detect more complex behaviors like whether a player intentionally plays below their skill level or has imposed an intentional handicap.
Some other interesting statistical analyses that could be performed involved assigning arbitrary “true skill” ratings to simulated players and use this rating to calculate a predicted win/loss rate between randomly generated pairings. This rating could then be used to randomly decide a result between these players based on these probabilities before updating the players’ ratings. By tracking the Root Mean Square Error (a statistical measure of the average error across a sample of data) between a the players’ ratings from a given rating system and their actual “true skill” rating, you can see how quickly a given rating system accurately measures a body of players’ skill.

This analysis has a limitation that you can’t actually know a player’s “true skill” rating. What you can do is look at actual win/loss rates and compare them to the predicted win/loss rates based on a rating system. By looking at carefully selected subsets of data, you can tell if factors like character selection, teammate familiarity, or other factors cause the error between expected and actual win rates to diverge from the average error.

Ranking Systems

While rating systems can be helpful from a matchmaking and analytics perspective, openly displaying player ratings has historically been found to be demoralizing for players. This is because most players are not in the top 10% and are in fact, mediocre. Furthermore, growth and improvement come slowly, so constantly seeing a mediocre, unchanging number can negatively affect player retention. This is where rankings can be helpful. They can obfuscate exactly “how good” a player is while providing a sense of progression. Some factors to consider when designing a ranking system:

  1. Granularity – This is how precise the ranking system is. Maximum granularity would be to ordinally rank each individual player from best to worst. A lower granularity might rank players based on specific numeric ranges or percentiles.
  2. Community – A rank can be determined globally against all players, or against a more concentrated subset. Most players want to feel like “the biggest fish in the pond”, which can be achieved by manipulating the set that they’re being compared against. Too many subsets can make this confusing for players, so it’s important to be selective.
  3. Permanence – Ranking systems don’t have to be permanent. Resetting rankings can give players a sense of progression as they climb the ranks, but overly frequent resets could take away the meaningfulness of a given ranking system.
  4. Subjectivity – In gaming contexts, most ranking systems are objective. In some sports, things like power rankings and other ranking systems might be decided subjectively using panels of judges or other means. Not covered extensively, but the way this is structured could affect perceptions of fairness.
  5. Progression – Tying ranking to ratings is often sub-optimal, as it does not deliver a significant sense of progression and makes it hard to reset the ranking system without also altering the rating system, which would then affect matchmaking. Ladders are a common ranking system that provide a sense of progression. By defeating a higher ranked player, a lower-ranked player can start from the bottom and “climbs” the ladder. Ladders often only can have one player at the top of the ladder, so using lower granularity at lower ranks can obfuscate how good (or bad) a player is before ordinally ranking them near the top.

Tournament Systems

Tournament systems are often used to determine the best player amongst a group of competitors, and the specific design of the tournament system can affect perceptions of fairness and overall player experience. Some of the formats covered included:

  1. Round Robin – The most accurate tournament system, requires every player to play against every other player, before determining rankings based on overall win/loss rates. Specific concerns are tie-breakers, and time. This tournament system will require N-1 matches, so impractical for large number of players.
  2. Single Elimination – Players are paired off and the loser between the two is eliminated. Much faster, requiring log2(N) rounds. Specific concerns are pairings/seedings, handling draws, and byes. Seeding is usually a particular concern, as the specifics could conceivably make it much easier or harder for a given player to win the overall tournament. Common approaches are to rely on external rating/ranking systems to decide seeding, preliminary rounds to determine seeding, or random seeding. Byes are used to handle tournament sizes that are not a power of 2, usually giving byes to highest rated/seeded players first.
  3. Double Elimination – Similar to single elimination, but used to address the issue that single-elimination tournaments make it so that luck could cause an otherwise highly skilled player to be eliminated early. Players are seeded like in a single-elimination tournament, but losers are used to seed a second “losers” bracket while the original “winners” bracket continues. Both brackets proceed until the winners of the two brackets play in the finals. The player coming from the losers bracket must win two separate matches against the player from the winners bracket to win the overall tournament. Further triple/quadruple/etc. elimination brackets are possible, but bookkeeping and visualization of such brackets can become challenging.
  4. Swiss – Used to address an issue with elimination tournament systems where high-skilled players get (or have) to play many games, while lesser players do not. In a swiss tournament system, players are initially paired either randomly or according to some seeding and play each other. Players are then paired based on their win/loss rankings and play each other. This is repeated for a pre-determined number of rounds to determine an eventual winner. Particular concerns here are determining pairings and breaking ties. A Swiss tournament with log2(N) rounds takes the same amount of time and yields the same result as a single elimination tournament. A Swiss tournament with N-1 rounds is effectively a round robin tournament.
  5. Hybrid – This tournament system sets out to address the fact that Swiss tournaments often lack guaranteed tension toward the end of determining the final winner that elimination systems present. Additionally, a Swiss tournament with more or less than log2(N) rounds is mathematically almost guaranteed to produce ties, meaning that final results will be resolved according to tie-breaking procedures, which are often boring. Hybrid tournament systems will usually use Swiss systems to rank players and seed a much smaller elimination bracket of the top players. This addresses a number of issues, limiting the overall number of games, creating the desired final tension, and still guaranteeing lesser player some number of games.
As a side note, the majority of tournaments I played (and organized) followed Double Elimination formats. Larger tournaments would rely on the hybrid scheme described above using a set of Swiss pools to separate top-level players from each other and to prevent them from eliminating each other during pools before proceeding to a double-elimination bracket. The benefit of using swiss pools was that the pools didn’t need to be played concurrently, enabling the effective use of limited play space and giving players a break while waiting for their pool to come up or to complete. Some tournaments with small attendance used a round-robin format, but that was by far the least common because of time and space requirements of that format.
 
The chapter moved on to how to handle people dropping out of various tournament formats. Generally, the response would be to nullify results and/or to provide their opponents with a bye.

Unreal Side Project Update

I promise that I’m still working on this! It unfortunately keeps losing time to other things. Last week, it was a technical design test, and this week, it’s been a resume rework. That being said, I’m continuing to clean up my implementation of free camera movement and am clarifying the roles of each component in the code architecture. The involved systems include the Enhanced Input System, the Gameplay Ability system, custom Camera Actors, the division of responsibilities between my custom Player Controller and my custom Game Mode, creating an overridden Movement Component, and programming a camera state system for maintaining what particular mode the camera is in at a given time. As you can see, there’s quite a bit there and lots of places for things to go wrong (which they definitely have in all sorts of fun and interesting ways). Once I’ve gotten everything cleaned up, and commented clearly, I’ll be able to provide some visual examples, a breakdown of the different responsibilities of the architecture in place, and probably a diagram to make it all a little bit clearer. Definitely look forward to it! Or don’t. I’ll be working on it anyways. Thanks for reading, and have a great week!