Hello, and thanks for stopping by for the this week’s entry in my Weekly WaveShine series! In this week’s entry you can look forward to:
Analytics and Hogwarts Legacy
As part of a game design exercise from Game Balance by Ian Schreiber and Brenda Romero, I have built an analytics plan that defines some key questions regarding the balance within Hogwarts Legacy, a popular AAA game that was released recently. The plan then breaks down what metrics are needed to answer these questions and what statistical techniques would be employed to gain said insights.
Disclaimer: I strongly oppose the rather horrible things about the trans and LGBTQIA+ community said by the original author of this game’s IP. That being said, it was difficult to turn down the chance to experience one of my favorite childhood stories coming to life. While I believe it is an interesting subject of study from a design perspective, my choice to do so does not indicate any support or approval of discrimination in any form.
Hogwarts Legacy
One has to seriously question a bunch of teenagers running around with a spell that outright kills people.
A complete response to this exercise includes the following:
I personally feel the game is rather mediocre when separated from the Harry Potter IP. My gut intuition is that although the game itself will sell well, it will not be a definitive title for this generation of consoles and that most players will not play the game to completion. As a result, the game will not be remembered as a masterfully designed game and it misses a lot of potential for monetization. I think that this is due to a combination of factors, and that one of the most prominent of these is the overall imbalance of progression and narrative pacing within the game.
I believe that these balance issues are widespread and impact a wide number of systems. As such, the overall objective of this analysis is to identify which of these systems have the greatest effect on overall player engagement. The questions I believe that analytics can help provide insights into are:
The metrics I’m particularly interested are measurements of key gameplay events/data based on player save files. I prefer these data over additional player surveys because I believe that players that are invested enough to answer a survey about the game will likely exhibit different play patterns than the average player. I also prefer gameplay data over additional playtesting because based on the constraints of the question, this data can be assumed to already be available and the largeness of the data set will help offset statistical errors related to sampling biases that may arise in concentrated playtests.
From player save data, I’d like to gather the following on a per player basis:
Alright, this is where the questionable math begins. I’m not a statistics major, I’m an engineering major, which means I think I know how to do statistics, but probably don’t.
So the first thing I want to do is perform hypothesis testing on Players’ overall completion of the MSQ. I believe that the mean percentage of this value can indicate how engaged players are with our Hogwarts Legacy.
For those that don’t know, hypothesis testing is a form of statistical analysis that is used to determine whether or not a particular hypothesis is statistically significant based on a sample of data. For more info, you can read here. The hypotheses we want to test are:
The next analysis I’d like to perform is at attempt to separate the players into “bins” based on their primary motivations for playing the game as indicated by their save data. This information is particularly difficult to gather as both a direct survey and user play testing would be influenced by response bias. I would say this subsequent analysis also is not statistically rigorous, but is at least a somewhat structured approach to approximating these player groups.
The procedure I propose is for a series of key metrics, to set a critical value (in the graph to the right 1.65 standard deviations above the mean) corresponding to some top % of players (in the graph to the right, the top 5% of players). This assumes that the distribution of the data follows the normal distribution. This critical value would act as the threshold that a player would need to meet to qualify for categorization into a given bin. Players will only belong to one bin to simplify the analysis.
Visualization of a Right-Sided Z Test
Credit for image here. This analysis is concerned with the interval marked in red.
The bins, metrics, and order I’d define them in is as follows:
The next thing I’d want to do is create a scatterplot based on the normalized values of the average amount of time that has passed since the player last acquired a spell to the last time they played a game vs MSQ% completion to see if there is any trend or relationship between these variables. After controlling for outliers, possibly by using the bins defined above, the what I’d specifically be looking for is a negative relationship between the two variables. That is to say, the more standard deviations above the mean a player’s average amount of time is, the more standard deviations below the mean the player’s MSQ% completion is. If this general trend holds, then it would indicate that progression with spells is likely an important factor to player engagement, and that further analysis of how key progression events within the spell system affect average MSQ % completion or even absolute play time.
Again, this would support my original qualitative analysis of the game which is that progression with spells is one of the strongest forms of engagement when it comes to systems within the game and that because this progression occurs to quickly, it negatively impacts overall player engagement as the game proceeds.
The next analysis I’d like to perform is on gear progression. Using a similar method to the section above, I’d like to generate a scatterplot of the normalized values for the amount of time since the player last increased their gear score to when they last played the game as compared to the normalized values of MSQ % completion. I’d again suggest controlling for outliers and potentially applying binning to the samples before generating the scatterplots. I would also perform the same analysis but instead of the amount of time since improving gear score, I’d like to look at the amount of time since the player intentionally used the crafting system. For both of these scatter plots, I’d expect for there to be no general trend. This would support the theory that there is probably not a very strong correlation between player interactions with the gear system and their actual engagement with the game. My rationale behind this is that generally there isn’t a reason to equip anything other than the statistically most powerful piece of gear you have in your inventory and that the drop rate of gear is so high that players don’t have much incentive to engage with the crafting system.
If this were the case, this would be grounds for further analysis of the gear system to see if there are some adjustments to drop rate or overall gear score progression that could be altered to make this a more meaningful system for players.
Finally, I’d like to take a look at the ability system. Again, we’d generate a scatterplot of the normalized values for the amount of time since the player last unlocked a new ability to when they last played the game as compared to the normalized values of MSQ % completion. I would again expect no statistically significant trend to be present. My rationale behind this is that while abilities seem exciting on the surface, the abilities themselves do not produce any meaningful changes in the way the player plays the game. It simply weights how strong a given option(s) are, but doesn’t actually increase the depth of decision-making within the game. Therefore, I’d expect in the long term that players would realize the lack of depth of this system, and unlocking additional ability progression would cease to be a driver to player engagement.
Again, we’re using this higher-level test to determine whether or not there is value in continuing to investigate the ability system and opportunities for rebalancing.
As a result of performing these analyses, we’d have a clearer picture into whether or not there is an issue with player engagement, what the overall demographics of our player base looks like, and which systems have room to improve. Furthermore, we’d be able to tell if certain systems are more successful at engaging specific player groups. With this information in hand, we’d be able to make an informed decision about what the business and design priorities of the effort are and what systems will be best to change in order to meet these objectives. Further, more specific analytics may be employed or the team could proceed directly to playtesting and iteration depending on timelines and business needs.
Mobile Design and Prototyping Plan
Unfortunately, I haven’t been able to spend time in Unreal working on my side project because I was performing a technical test for a potential employer. While I can’t elaborate on the specific employer or the details of the test, the problem itself was sufficiently open-ended such that I can share my analysis and the plan for the prototype I ultimately wound up implementing as well as my general process and rationale behind my decisions.
As always, I started this problem by first defining the problem space. In this scenario, the assumptions I made as I defined the problem space are as follows:
The overall design I suggested exploring given this problem space was is a rhythm-based tower-defense game where Players can place gacha characters that can be tapped on to produce greater damage depending on each character’s unique rhythms. Depending on budget constraints, characters could make individual contributions to a common musical theme by replacing the “instrument” used to play each individual part of the sound track by simply passing each instrument through an audio filter or a set of samples that is unique to each character. Particularly rare characters could potentially change the song played itself, helping to drive the value of making additional gacha rolls.
Here are what I believe are some of the strengths of this design separated out into the key elements of a design:
Here are some of the challenges/risks that I personally see in this design:
Based on the limitations faced by this particular design, I personally feel it would be necessary to quickly create a few simple prototypes to see if the actual task of rhythmically touching particular units on the screen would be enjoyable. The first prototype (which is what I implemented) is intended to help simulate this task. The prototype I implemented has simplified features that reflect what the overall task of a rhythm based TD game might feel like. The features are as follows:
The hope is that this helps internal playtesters envision what it would look/play like with more units. Units could be organized into categories/classes to group them under a single “note” on the musical staff in order to keep complexity lower. Additionally, there are a number of alternative ways to visualize the rhythm the Player will need to press, but I figured it would be most important to first confirm whether the task of rhythmically pressing a few buttons/spots on the screen would be enjoyable or not.
In terms of next steps, subsequent prototypes should likely experiment with better ways to visualize oncoming rhythms, control schemes that are actually implemented and tested on touch screens, and possibly the technical capacity for playing all of the different musical sources present in this design. It would also be good to take a closer look at the current state of the mobile market and consider the desired visual theme. Getting some rough concept art as well as some general estimates of feasibility from artists would also be helpful.
What I’m Reading: Balancing Metagame Systems
Between the technical test I mentioned above and completely re-writing my entire resume, I wasn’t able to finish Chapter 14: Metagame Systems from Game Balance by Ian Schreiber and Brenda Romero. That being said, so far, the chapter has covered the following: