Articles

Testing Methodology Applied to Games

This article was published at LinkedIn 25 Aug 2014.

Can software testing methodology be applied to games? That was the question that inspired me to start designing games under Nova Suecia Games. After all, testing is more than verifying and validating a product or a service, it is an ongoing quality effort covering all parts of a project from start to end.

The Test Process

Let us move into the general test process and see how a game project would fit into it:

Planning and Control: Identify what you need to test and how to measure it
Analysis: Define what to test
Design: Define how to test
Implementation: Prepare tests
Execution: Execute tests
Evaluation: Evaluate tests
Closure: Archive test documentation

Planning and Control

Understand your game's unique quality and let it lead your design and test

When you start designing a game, you probably have grand plans about everything you would like to include in it. But how do you verify that the parts are good? And how do you verify that the whole is good? Do you even know what will make your game good?

This is where an understanding of quality gets important. The PMI definition of quality is "how well the characteristics match requirements". To design a good game, you need to fulfil the game's requirements, and to keep you on the right track, you need to plan what to test and how to test.

So what are the requirements of your game then? To be fun of course - we play games to be entertained. But how do you test if a game is "fun"? The perception of fun is not only dependent on individual preferences but may also vary with time, place and other circumstances. Nevertheless, there are things you can do to break down your game's fun factor into concrete and testable requirements.

First, there are several objective criteria that are generally acknowledged. Wolfgang Kramer, designer of games like El Grande, lists the following:

Originality: The game has elements that, individually or combined, never been part of a game before
Replayability: The game is different each time it is played
Surprise: The game is not repetitive
Equal opportunity: At start, each player has an equal chance of winning
Winning chances: At the end, each player still has a chance of winning
No "kingmaker" effect: A player without hope of winning cannot determine the winner
No early elimination: All players are involved in the game to the end
Reasonable waiting times: There is no inactivity between turns
Creative control: The players has the opportunity to affect the game's progress
Uniformity: Title, theme, format and graphics give a unified impression
Quality of components: Game components are durable, functional and visually appealing
Target groups and consistency of elements: The elements are adapted to the game's target group
Tension: The tension is high throughout the game
Learning and mastering a game: The game takes short time to learn and long time to master
Complexity and depth: The more depth a game has (measured as how easy it is to differ between optimal and suboptimal play), the more complexity is accepted (measured as how easy it is to differ between legal and illegal play)

Second, there are objective criteria which depend on the game style.

Game Design Concepts list the following:

Players: Solitaire, two players, players individually against each other, players teaming up against each other, players teaming up against the game etc.
Objectives: Capture, control, collect, solve, chase, build etc.
Information: Public to all, private to individuals, hidden to all etc.
Sequencing: Turn-based, simultaneous, real-time etc.
Interaction: Conflict, negotiation, trading, information sharing etc.
Theme: Abstract, narrative etc.

There is no right or wrong here. Generally, a game should be consistent but there are also exceptions where a good balance between contradicting criteria works. A no-luck game like chess would be a worse game with a random pawn promotion. Backgammon on the other hand combines the random dice roll with the strategic decision.

Third, there are also subjective criteria that contribute to make a game fun. Different games invoke different feelings in players and this is what truly makes a game unique. Chess and Trivial Pursuit are two popular games that satisfy most of the objective criteria and yet are completely different in terms of feelings. The objective of my games is to invoke the following feelings:

"I quickly understand the game objective and how to accomplish it"
"The game tells a story and I'm part of it"
"The gameplay is new and inventive"
"I need to cooperate directly or indirectly with the other players"
"I need to pay attention not only to my own play but also to the other players' play"
"I am master of my own destiny and don't rely on luck"
"The game keeps me veering between hope and despair"
"The game is open until the very end"
"Each new session is a new game"
"I want to play again!"

As a designer, you need to set a plan for the criteria of your game and design with those in mind. As a tester, you need to test how each element of the game satisfy the criteria of the game.

In my game Find the Bug!, I wanted to make the players feel they were test leads in a real project. The "fun" factor would mainly come from the relation between applying test techniques and finding bugs. However, the game must not be too complex but easy to learn and quick to play. By stipulating those and other criteria, I came up with a "mental checklist" for the game.

To summarize with Wolfgang Kramer's words: a good game will stay with us all our lives and make us long to play it again. Make sure that you understand the criteria that make your game good!

Analysis

Do not test your game until you know what to test

When you understand your game's unique quality, it is time to analyse the specific criteria to test. Each criterion should be broken down to the smallest testable element possible. This may seem unnecessary or even counterproductive, as a game's quality is determined by how well all its parts work together as a whole. However, the analysis activity will help you understand why your game doesn't work and to tweak and tune all bits and pieces into perfection.

In my very first game test, I observed several areas of improvement but could not trace them back to the quality criteria of the game. The result was that I modified the game rather aimlessly and several iterations were required to get the game back on track. A proper analysis before the test would have helped me to do a proper root-cause-analysis afterwards.

The analysis is a dual activity where you in your designer role drive towards new exciting ideas and in your tester role checks the map to ensure that you are not losing the way. This is a critical phase of your game design because if you deviate from your game's unique quality now, it will be very difficult to return to it. Another good game designer, Reiner Knizia, designer of good games like Tigris & Euphrates, is of the opinion that "many people think that a game is finished when there is nothing more to be added. I believe a game is finished when there is nothing more that can be taken away and still leave a good game". In project methodology, this opinion is similar to the recommendation against "gold plating", where features are added that are not requested by the customers (your future players) and are not in the scope. An inventive battle system has nothing to do in an economic game and a stunning futuristic art would be completely wrong in a medieval game. No matter how much the designer in you like an idea, the tester in you must keep it out of the game. If the idea really begs for a game, put it on the shelf until you have a game that will make justice to it.

The outcome of the analysis should be an understanding of what elements to include (and exclude) in the game and what to test in each element. This may be a list of alternative game strategies to be tested for balance, of components to be tested for the physical use in the game or of art to be tested for consistency. Do not worry about a detailed list of elements at this stage, game design is an iterative process where you have to return and reevaluate your ideas. The important thing is that you know why you design the game in a certain way and why a certain test is necessary.

Returning to my game Find the Bug! as an example, one of my criteria was to replicate the daily work of a test lead. Of the many different tasks, I focused on that of finding bugs. As a tester cannot know where the bugs are but, using risk-based testing, may assess where they are likely to be, I needed a mechanism that could increase the odds of finding bugs. The test in this case would be to check the expected result of the different test strategies (ad-hoc testing and risk-based testing) and ensure that a risk assessment would give a reasonable pay-off (not too low but not too high either).

Design

If you do not know how to test, your game will be perfect in your eyes only

If you know what to test in your game, it is now time to find out how to test it. Testing a game is much more than just playing it and the software testing methodology offers plenty of test types to choose from.

Functional vs Non-functional: "What" the game does vs "how" the game does it
White box vs Black box: Player perspective vs game perspective
Static vs Dynamic: Reviewing the game vs playing the game
Positive vs Negative: Playing the game vs breaking the game

Functional vs Non-functional testing

Functional testing covers everything that defines the game, that is how the players interact with the game and with each other. Which actions can a player take? What does she need to be able to take the action. What will happen after the action? Those are the kind of questions your design needs to answer and your testing to verify. The testing is independent of physical game components - it is only the "spirit" of the game that is tested. The purpose of the testing is to ensure that the game theoretically plays as expected.

Non-functional testing covers everything that is required for the game to be physically played. Which components are needed? How are they to be handled? Are they open or hidden to the players? Each component of the game must be designed with its purpose in mind and this purpose must be understood when testing.

A test that covers both functional and non-functional tests of a game is scalability: Does your game mechanisms work with more or less players (functional) and will more/less components be required (non-functional)? An iterative process is usually required where you design and test with different number of players to assess how many players the game can be played with and if the game needs modifications for different number of players.

One of the ways the players interact with Find the Bug! is to place pawns on the game board, either to check the probability of bugs or to try to find them. To design this, not only did I have to set the density and severity of bugs to balance the player strategies but also come up with a way for the players to get information about the probability of bugs without revealing the actual location of the bugs.

The functional test in this case was to test that the expected value of bugs from checking the density would be equal to the expected value of bugs of checking the severity. In addition, both strategies should have a higher expected value than the ad-hoc strategy of not checking at all.

The non-functional test on the other hand was to test the various ideas for physical representations of bugs, density and severity. Would the representation be transparent to the players? Would it keep the location of the bugs hidden? Would it be easy to set up and handle during the game? Without a pass on all those non-functional tests, this particular part of the game would not work!

White box vs Black box testing

In software testing, white box testing refers to to internal structures (how the software does it) and black box testing refers to functionality (what the software does) and. For game testing, I would like to transform this into testing from the game perspective and testing from the player perspective.

The game perspective (the white box) would be to identify different paths that the game may follow and calculate statistics for them. What is the average, minimum and maximum of a certain path? Are the paths balanced? Are there any dead ends? Will the components be enough? Those are questions that white box testing may answer for games.

The player perspective would be what you typically may think of when it comes to test: a live player session. However, you can and should do own player testing first by setting up various strategies and simulate them in action. This will help you understand what takes place in the mind of the players while playing your game.

For Find the Bug!, I relied heavily on spreadsheets for my white box and black box testing. The first image below shows a sample white box test case. In the game, testing is done by drawing 1 tier tile and 1 module tile from bags and if both tiles contain a bug, a bug is found. The test case simply calculates expected bugs for different levels of density and severity.

White box test case: Expected result per density and severity level

The second image shows a sample black box test case. In the game, players do not know the exact location of the bugs but may do analyses to get to know how many bug tiles there are in a bag and and draw conclusions about the probability. The test case simply keeps track of the players' scores for different analysis strategies.

Black box test case: Simulated result per strategy

Static vs Dynamic testing

Dynamic testing refers to all testing where you play out parts or the whole of the game. Static testing is the opposite and typically refers to reviews of components. It may be perceived as a daunting task but is nevertheless important. No matter how great your game idea is, poor static testing will prevent other players from realizing it or even render the game unplayable. As a designer, it is easy to become blind to details so make sure to document every design detail, no matter how small. Use the documentation as checklists and go through them carefully. Rule language, game examples, color codes, font sizes, position of images and overall consistency in the layout are among the things that belong to the list.

Find the Bug! relies on simple art and well-known process symbols but there were still many checkpoints. One example was the relation between the tiles, the pawns and the game board. The tiles must fit the squares of the game board and pawns on the tiles must not cover the number. It is true that this kind of relation is a design task but the design will change during your work and to ensure that all changes are propagated to all components, checklists are useful.

Positive vs Negative testing

Positive testing is all the testing that proves that your game works but what about negative testing? Negative testing refers to all attempts to break the game, intentionally or unintentionally. Are there loops which will prevent the game from progressing or dead ends that will prevent it from ending? Even classic games like go and chess have those issues that must be handled by special rules. ("Ko" prevents recurring positions in go and stalemates in chess ends in a draw). What if a player plays against the intentions of the game or even try to cheat? Or if a player deliberately attempts to sabotage the game to prevent the other players for winning? Or if a player promotes another player's victory (known as the "kingmaker" effect)? Depending on the style of the game, other negative scenarios may include bad starts (making the game boring for that particular player), unstoppable leaders (making the game boring for all other players) or too random victory conditions in the end (making the game towards the end uninteresting). Everything that may work against your game's unique quality constitutes a negative test scenario that must be tested.

One important negative test of Find the Bug! was to ensure that no player would know the location of the bugs, neither during setup, nor during gameplay. If this test would fail, the game would have no challenge. Another important test was to ensure that players doing analysis tasks would keep the information for themselves. Otherwise, an analysis would only benefit the players next in turn. I also had to ensure that a player could not lose the game for all the other players by deliberately failing to find bugs.

Implementation

There is a time and a place for test

By now you have a number of test cases for your game to ensure its quality. But how do you decide when to run all those test cases? The software testing methodology answer to this is the implementation phase. This is when you organize, prioritize and schedule your testing. One good starting point is to group them by test levels.

Unit test: Test of individual game elements
Integration test: Test of interactions between game elements
System test: Test of the game as a whole
Acceptance test: Test of the game with external players

As you can see from the above list, testing is a bottom-up process, where you first test the smallest parts of the game to make sure that they work before you test them together. This will help you isolate problems and trace them back to the source. However, it does not mean that testing is a one-way process. You will need to move back and forth between the test levels as you test and improve your game. Art is typically something that you go back and retest once the rest of the game works. The main benefit of grouping your testing in test levels is that it helps you test the right thing at the right time. Let us look at them more in detail.

Unit test

Unit test ensures that the smallest elements of the game works according to your quality criteria as discussed in the beginning of the article. This may refer to a certain component or a specific mechanism of your game that can be tested in isolation. Testing in isolation does not mean that you design in isolation - you should know the purpose of the element in the game but still only test the element itself.

A functional unit test in Find the Bug! was the expected value of bugs from different density and severity levels as described in Design. A non-functional unit test was to check the color codes of all green and black tiles.

Integration test

Integration test ensures that the elements of the game work together. However, while software systems are often modular, game elements are often so interrelated that it is difficult to test single relations. Instead, try to look at the flow of the game and identify events that need to work smoothly for a good flow. Allocation of resources, track of victory points and transactions between players are examples of events to test during integration test.

A functional integration test in Find the Bug! was that the different tasks of the game (analysis, test, retest) could be done on the game board at the same time. A non-functional integration test was to check that the tiles fit the squares on the game board.

System test

System test is when you actually play the game from start to end. The first system tests may be simulated but you should also produce a prototype at some stage. The prototype does not need to be fancy as you will likely have to rework and retest the game several times. System test is your chance to tick off as many quality criteria as possible before you invite other players to your test so try to simulate as many different player styles and scenarios as possible. Once all your quality criteria has been checked, you are ready for acceptance test.

Find the Bug! was simulated in spreadsheets several times before a prototype was ordered from The Game Crafter.

Acceptance test

Do you think your game is perfect now? Good, then it is time to see if external players share your opinion. Acceptance test is critical as this is the first time someone else than you plays the complete game. You should of course discuss game details with other designers during your work and perhaps even play early prototypes with them but at some point you will be needing input from players who have never seen the game before and can play it without any preconceptions.

Acceptance test should reflect an ordinary game session where the players (the testers) play your game as they would have played any other game and you (the test lead) act as an invisible observer. Ideally, this means that they should read the rules themselves and discuss any concerns among themselves without consulting you. Make sure to set the expectations so that they understand this - once your game is published, you will not be there to guide new players. (With less experienced players, you may consider acting as a game master; setting up the game, explaining the rules and supporting the game progress; but you should not let this be your only acceptance test.)

Prepare yourself for the acceptance test with the same checklist used in the system test and a journal where you can make notes of events and actions during the game. In addition, prepare a questionnaire to capture lessons learnt from the game. It is important to capture both good and bad things. Examples of questions include:

Which was your favorite/least favorite part?
Were any parts too long/too short?
What was easy/difficult to understand?
Was something missing/unnecessary?
Did the game engage you?
Did you feel that you could affect your progress?
Did you understand how to win?
Did the right player win?
Was it fun?
Do you want to play again?

An objective but yet powerful way of measuring the result of an acceptance test is the Q4T Score. This is based on the players' reaction after a blind test where the designer don't answer any questions, only write them down and refer to the rule book.

Players didn't want to play the game at all
Players didn't complete the game
Players didn't want to play again, even when asked
Players didn't spontaneously want to play again but agree to when asked
Players talk about the game and spontaneously want to play again
Players want to play again within 10 seconds

What the score says is simply that games that don't score 4 or 5 are not worth publishing.

For the first acceptance tests of Find the Bug!, test colleagues at work were invited and the above checklists and questionnaires prepared. Since the game may be played both with a teacher and by students on their own, I took on the role of a game master and prepared everything so that they could focus on the game experience. For later acceptance tests, I acted as an observer.

Execution

If you do not know what you tested, have you really tested?

Now that you know what to test, how to test and when to test, it is finally time to actually execute the test. If you have followed the methodology so far, you are well on your way to ensure quality but the most important remains: documentation and investigation.

Whatever test type and test level you use, you should document the test, the expected outcome and the actual outcome. Also document information about the test environment, such as date, version and number of players. This will help you follow up issues afterwards. If possible, try to complete the test first instead of immediately trying to fix issues. Otherwise, you may fix only the symptoms, not the actual problems. Or worse, you may change things that actually work. Instead, collect as much information that you can and then apply a holistic perspective on them. The following questions should be answered:

Is it one issue or several in combination?
What is the root cause of the issue?
Which quality criteria are affected by the issue?
Could a similar issue be present elsewhere?
How can I fix the issue?
If I fix the issue, which other quality criteria may be affected?
How can I prevent this issue in the future?

Issues on lower test levels, such as the color of a component, may not require the full process above but you should nevertheless document and investigate them as well. Perhaps you need a better template? Perhaps other components have wrong color as well? What if you accidentally included components from an old version and your entire game needs to be reworked? A good documentation and investigation will help you avoid the issues in the future.

The early testing of Find the Bug! was much about finding a balance. The tables below show two excerpts of the test documentation. In test case 1 (early version), players would have 50% chance of finding bugs without analysis and only marginally better with analysis, making the pay-off time for letting a pawn analyse instead of test too small. Test case 2 (final version with less bugs) resulted in a better balance.

Test case	Bug distribution			Total bugs	18
1	0	1	2	Probability without analysis	50,0%
	1	2	3	Probability with analysis	62,5%
	2	3	4	Pay-off time	4

Test case	Bug distribution			Total bugs	18
2	0	0	1	Probability without analysis	25,0%
	0	1	2	Probability with analysis	37,5%
	1	2	3	Pay-off time	3

The higher the test level, the more important it is that you document and investigate issues, particularly if they are discovered by external players that may not be available when you start fixing them. Using the checklists and questionnaires above, facilitate a discussion where you elaborate on the game experience and brainstorm potential changes to the game. As a game designer, you must balance humbleness with integrity - do not be overly protective against criticism but do not blindly accept proposals either. You may have personal feelings for your game but you also know the reasoning behind certain elements and should share this knowledge with the other testers. It may be the case that the game's learning curve is too steep for only one session (which, depending on your quality criteria, may be another issue...).

My very first game acceptance test was a good example of bad practice: I participated myself, took in all the testers' opinions, without any discretion or any tracing back to the root cause, and ended up with a game far inferior to the game that entered the test.

For Find the Bug!, I had learned my lesson and did not participate in the game myself but acted as an observer and documented what I saw. One interesting observation was that they placed 2-3 testers on analysis tasks instead of the optimal 1. As expected, this resulted in low scores and all players actually finished on the same bug value: 2. I did explain how the game concepts related to test concepts but left the discussion to after the game. Some suggestions were given about placing the tiles on the board instead of in bags but when I explained the reasoning behind the setup (the relation between the color of the stone and the number of bugs), it was accepted. The most important test passed: they all wanted to play again!

Evaluation and Closure

A game test is only the end of the beginning

With all the testing completed and the "perfect" game ready, what is the next step? Forget about all the hard work behind it and just let the game stand on its own? No, a game is tested every time it is played so it is important that you archive all your documentation for future reference. A FAQ could be created or strategy articles written. Other players may discover something that you missed or come up with ideas that you never thought of. You may consider designing new editions or expansions. In addition, you should take time to reflect back upon your work and think of the lessons you learned. For all this and much more, a good quality documentation of everything that went into your game and made it unique will be invaluable.

The following are some questions that a quality documentation may help answering:

How do you best learn/teach the game?
Would this strategy work/break the game?
If I add/remove this element, what may happen?
Which elements would benefit/suffer from changed components?
Can I prove my rights to copyrighted material (if any)?
If a player dislikes the game, is it because of the game itself or the player preferences?
Can/should the game be adapted to another audience?
Which game-defining elements are there that could be used in marketing?
Could elements included/excluded be applied to new games?
What should I keep doing/doing differently next time?

Although Find the Bug! was released only recently, the quality documentation has already come to use. Most noticeably, the children's game Find the Treasure! was built on some simplified mechanisms (the bugs were replaced by a hidden treasure) and some changed mechanisms (the permanently placed testers were replaced by moving pirates). I had no intentions to create such a game while working on Find the Bug! but during the lessons learned sessions with the testers, those mechanisms just begged to be used once more.

This concludes my ambition to apply software testing methodology to games. The key message here is that quality must be a red thread throughout the entire game design process. Understand what quality means for your particular game and ensure that you test the right thing at the right time and in the right way. This will not ensure the perfect game but hopefully help you recognize it when you see it.

All comments on this article or on the topic of how we game designers can bring quality to our games are welcome. Thank you for your interest and good luck with your own games!

Articles

Learning by Gaming

Please leave a comment on the games or contact me directly at nicholas.hjelmberg@gmail.com.