Jan 04, 2018

Unit Testing Games

It's not uncommon for me to get asked on Twitter (when I was active there) why we don't unit test games? They would mention how they work in blah blah industry and do unit testing all the time and it's virtually eliminated the need for human testers. They then go on to call me a jerk face and tell me I make stupid games.

OK, I made up that last part, but sometimes it feels like they said it.

It does bring up an interesting question. Why don't we unit test games like you would other software? I can't speak for others, but for me, it just isn't that useful.

What is unit testing you ask? It's basically where you write programs that feed data (sometimes random data) into program's functions or modules and make sure you get the expected results or proper errors. That's an oversimplification, but you get the idea.

Unit testing texture or asset loading can be useful, as is unit testing many other backend engine routines like memory managers, but that's just not where most (and by that I mean 99%) of the bugs comes from.

Most of the bugs in a game (or at least my games) come from users doing stuff, and not stupid stuff, but normal expected stuff and the game logic just being wrong. This is really really hard to unit test and the failure states are hard to identify.

For Thimbleweed Park, we had the TesterTron 3000™ that randomly played the game for hours on end. This was useful in that it found occasional crash bugs, but it wasn't very good at finding game logic bugs because, after an overnight run, it was hard to spot that something had gone wrong 6 hours ago. This is especially hard if the logic issue was very room-local and hard to spot later.

TesterTron had a mode where it would jump into each room one-by-one and this was useful in catching missing assets or bad packaging.

But we encounter a lot of bugs where the game works fine, but something is just visually wrong. The Sheriff picks up the phone and his animation isn't aligned correctly. Or worse, it's not aligned correctly only after he's also open the jail door. Or Delores stands in the just the right spot and she clips into the desk.

I just don't know how to realistically unit test this stuff. It's possible, but the cost and effort involved would quickly outweigh the benefits. We're not making flight software for 777s. No one of going to die or lose their entire bank balance because of our bugs. We're not writing enterprise level software, we're making video games and the cost of most of our bugs are negligible compared to monitoring a nuclear reactor.

It's not that we don't care, we care a lot. One of the largest expenses in Thimbleweed Park were testers.

I say "most of our bugs", but not all. Post Thimbleweed Park launch, we did discover some bugs that caused players to lose progress and these are very painful to us.

We fix them as fast as possible and release a new build, but with each of them, I don't know how unit testing would have helped. We had a problem on the PS4 where savegame data would get corrupt due to threading. We could have built a stress tester for this, but that's all 20/20 hindsight. I can think of 100 other places that worked fine and without that hindsight, we would have spent a lot of time writing stress testers.

Again, not impossible, just not economical.

But maybe I'm wrong. Maybe other game developers have amazing game unit tests that are enormous time and money savers. If that's true, I'd love to hear about them. I'm basically lazy and more than happy for robots to do my work.

But it feels like the ability to unit test games would require a good AI that can spot perceptual issues that go way beyond just getting bad data out of a function. Or maybe that's called Early Access.

Andy Dopieralski

Jan 04, 2018
https://www.youtube.com/watch?v=IKiNkBFP698

As a LONG TIME tester, I understand the importance of unit testing. But, especially for narrative, human testers are absolutely necessary.

Dom De Re

Jan 04, 2018
In my non-gamedev work I have found unit testing and static verification incredibly helpful and must-have.

However in my game dev work I must say I agree with you, its just not as simple to apply.

I've personally had more luck on the static verification side of things, using expressive type systems similar to those of Haskell etc... This is basically a subject of great interest for me, as I'm trying to answer the question of how applicable these methods are in the gamedev domain.

Regardless, it similarly brings my blood to boil when people attempt to push you to cargo cult something that worked (incredibly well) on the system they built that processes forms with numbers of fields ranging in the 100s to something as dynamic as a game where the variances go way into the 1000s.

I don't know, i'm bad with numerical estimates, both are probably off.

The point is it is with a game its just way higher.

I previously asked you to write more on the specifics of how you implemented the TesterTron 3000, I'm fairly experienced as a software engineer, but unfortunately I can't really bring much of it to bear on this specific problem.

You seem to have downplayed its value in this post but I'd still be very interested.

salmonmoose

Jan 04, 2018
It sounds like you're actually describing Integration Testing.

Unit tests *should not* take random data, they should test sections of code do what they're designed to do. They are not designed to find holes, but make sure new ones don't appear.

Integration testing works on the system as a whole, and should find scenarios where interactions cause bugs which should in turn, be turned into unit tests.

Lee

Jan 04, 2018
I find unit testing useful for iterative development. I can iterate very quickly when my functional units have been reliably validated through unit testing.  If you never intend to carry over your game code to version 2, 3, 500, etc, then I would agree that it's pointless.

Frank

Jan 04, 2018
https://en.wikipedia.org/wiki/Unit_testing

I agree with some of the previous commenters that you are not using the pedantic definition of unit testing correctly. The discussion would be a lot more productive if you stuck to the pedantic definition so that it doesn't just fill up with a debate about semantics.

That said, successfully landing a testing infrastructure for the full user experience of a game is obviously going to be very difficult. No company has succeeded even with simpler applications such as individual apps in an office suite.

Alan

Jan 04, 2018
The most intriguing game testing system I've seen is Inform 7's skein.  http://inform7.com/learn/man/WI_1_8.html

Inform 7's skein is incredibly focused on the text adventures Inform 7 is designed to create. But it seems like the general design (correlating chains of inputs with expected resulting world state) would be applicable to graphic adventure games as well. Inputs might be "click here" "build a command with verb X, inventory item Y, and world object Z" "do nothing for W seconds". State might include screenshots (video?) and state variables.  It could potentially catch when a new bug rendered the game unwinnable, or when special case responses failed to trigger appropriately.

Of course, implementing something like that for a graphical adventure game seems like a major project on its own. I'm not sure it would catch enough bugs to justify the effort. But, it's an example of what such a system might look like.

Ron Gilbert

Jan 04, 2018
I agree with some of the previous commenters that you are not using the pedantic definition of unit testing correctly.

I may not be, but neither are all the people that tell me how easy it would be.  I'm not sure it matters for the sake of my argument.  Dividing it up into all the correct definintions only confuses the matter for people without the techincal knowledge. "Unit testing" is kind of catch-all that more people understand. The point of the post stands... automating this testing is next to pointless.

Sam

Jan 04, 2018
I think the reason people are being pedantic about the terminology is that unit tests would in fact be useful. When you write a function or a class, you create a set of unit tests that define how that function behaves. By doing so you ensure that it will always behave that way, even if someone comes along later and changes the internal implementation. Unit tests are about low level code, not overall funtion. Their point  is to catch programming errors, not design errors.

Martin Wendt

Jan 05, 2018
It seems to me that having Robert Megone as tester makes up for quite some automatic unit testing ;-)

mtoivo

Jan 05, 2018
I agree on the point that automatic testing of game is very hard, not minding what that process is actually called. Automatic testing can do all sort of things, but how can it determine if something has gone bad without actually understanding the game as a whole? Writing the test scenarios and the expected results so that it catches all the errors can very well be more expensive task than using human labour. I had to write a simple game a few years ago, and comments from the real testers were very valuable data indeed, especially when there was very little time to test it myself at all.

Catching even the simplest errors as a developer is like proof reading your own emails etc: you are the most qualified person to know if something is wrong there, only thing is that you are totally blind to even the spelling errors just because of that. Same goes with testing your own game: you know exactly how it should be played and cannot even figure how to play it the wrong way. Actual users are pretty good in trying to do all the stupid things. Like microwaving a freaking hamster.

Now that I've 'tested' this comment by reading it many times to catch all the errors in it, I probably find many stupid mistakes the minute I submit this.

Thomas

Jan 05, 2018
On a related topic, there was this article written a few years ago by a coworker about automated testing in SoBlonde (another Point&Click that would share similarities with your games). Mostly useful to make sure it was possible to reach the ending, and that there was no dead end:

https://www.gamasutra.com/view/feature/134893/automated_testing_building_a_.php

Francisco

Jan 05, 2018
What I understand is that Unit Testing works best to let you know if something that was working before stops working. So if your memory management test breaks, someone changed some logic in there and now you have a chance to fix it before the thing blows up in the hands of a user.
Human testers are absolutely necessary, since computers will never be cruel enough to think things like "What if I do the konami code backwards and *then* push that button over there".

Davide

Jan 05, 2018
Automated unit testing is useful in game programming, especially in multiplayer programming
(where you'd need more testers at once for the same feature - or that must be synchronized to the millisec to create a possible scenario) and to easily replicate all the tests you want for each software module ("units"), without spending too much time on it.
It allows you to do something repetitive on what you already know when it is working properly and when it is not, and easily to check at a given state.
The problem is that a videogame rarely is made by a discrete set of states (as it could be in other pieces of software), but it evolves at each frame in a different way based on used input. We don't have a series of actions and outcomes to reach an objective,
we have an objective and the rules of the "game world" and the player could move in this world using such rules and we'll just give an outcome when the objective is reached.
Furthermore it is organic (the game emerges from multiple things that are useless alone),
it is hard to check that all is working properly because there are too many things that could be wrong.
Yeah, it is right, something low-level can have unit-tests, but for most the game is (and where bugs are) automated tests would be far more complex that what they are testing.

Jade

Jan 05, 2018
I'm actually in the process of writing unit tests for one of my older games right now and I've caught several large bugs with the addition of my unit tests (most noticeably an equality operator typo that was easy to miss because the value it was accidentally assigning instead of comparing wasn't used anywhere until long after that). I agree that you're referring more to integration testing and yes that is a lot harder with all sorts of software, not just games, but especially where you're looking to check for visual errors. However I don't think you should discount unit tests and the confidence they provide to your programmers to know their changes haven't broken existing functionality that has been unit tested. It's quite empowering to a dev team when you have good unit test coverage.

Someone

Jan 05, 2018
@Ron: You should post this in the TWP blog too. :)

Chris

Jan 05, 2018
Are you not using your own engine for TWP, same one you made for Scurvy Scallywags? I'm surpried if you dont have any units in the engine code by now. Especially to keep any utility code working when you update it for the next game.

This is the kind of place where units are great at catching errors. Automating UI testing is really helpful for games, but the lower level code can definitely benefit from units, unless you're starting from scratch each time; but that would surprise me!?

Ron Gilbert

Jan 05, 2018
I'm surpried if you dont have any units in the engine code by now.

That's the whole point of this post... it's not where the bugs are. It's not an efficient use of my time. It might be for the type of programs you write, but not for games.

Steve Brown

Jan 05, 2018
[Forgive me in advance for some SERIOUS 'thinking out loud.'  It gets a little unstructured at times. Also, be warned: TWP spoilers.]

Let me first mention that I've never been a programmer for a paycheck; I've done quite a bit over the past few decades, but I doubt I'll be approaching things like a proper programmer. I'm mostly a "math guy in progress" at this point, although my interests are sort of "broad to the point that it's easy to accuse me of being unfocused."  c'est la vie.

Anyhow, I had actually first started wondering about using graphs to describe game state and the like, back when I took discrete math several years ago.  I didn't follow up on that at the time, as by that point anything having to do with game-making was perpetually on the backburner for me. But when the notion showed up in TWP, it rekindled my interest.

After I'd played through TWP a handful of times, I read through the blog post about puzzle dependency charts and decided to try making one for TWP.  This was around the time you had one of those testing-related conversations on twitter, and it got me thinking about the question, What ~could~ I test, and how would I go about doing it?

The puzzle dependency chart (which I think I'll start abbreviating PDC) that I drew up essentially just reflected the game's narrative; the "states" that I could define, knowing only what a PDC has to say, seems like would entail two things: game progress variables (bAgentsHaveBeenIndoctrinatedAboutArrestTron and the like) and inventory. We'd probably need to include constraints on when pick up-able objects appear and disappear from the game, as well as when rooms become accessible (and inaccessable). At one point, I think I considered including this in the PDC....but quickly realized that, if I started doing that, I'd be getting close to re-stating the complete design of the game (except on paper).

That brought me to a conclusion:
Incorporating the PDC into the design program would let someone design a game by attaching things like item visibility and room acessibility to the PDC itself. It the designer can (combinatorially) generate all possible states (such as {part = 2; bArrestTronIndoctrination = true; bWC67PartNumberKnown = false; ...etc. (for every other game progress variable); inventory = (bottle, etc.)} then we can test the game logic by applying all possible "actions" to the set of states.

(I'm using the word "actions" as in "group actions," but I really don't know if there's a group here. I may have to answer that question, at some point.)

What are the "actions," then? (I'm gonna stop putting quotes around that word now, btw.) Given that we're only talking about game progress stuff (and not the rendering state) then the actions are the 9 verbs, with one caveat; for every object in the game that can be used with another objected, we'd need to make a curried(?) action for that object; that is, "Use(obj_to_use, target_obj)" wouldn't just be one action, but rather you'd have one for every possible value of obj_to_use, which I suppose would mean there's one for every item in the game that can be added to inventory.  (Or, more likely, one for every item in the game that the constraints say can be accessed at the time that the state can possibly exist.  So, for example, in part 9, we have a short list of items that can possibly be in the inventory, so it wouldn't be productive to ask if we can "use record with corpse" during act 9.

It also seems to me now that we'd want to break up the sets of states into "states that can exist under the same set of constraints," then I think we'd have to pay some particular kind of attention to actions that move you from one set to another (meaning the actions that trigger a change in item/room availability). (If I mention those points again, I'll probably call them 'boundaries.' You know what the condensed matter physicists say: If you want to know how two materials work together, you study the boundaries.)

While we're having the design program generate these possible states and these lists of actions appropriate for the various sets of states, what are we looking to test?  I think the answer is this:
Given any one state and any one action, we want to resulting state to be another valid state within the same set of states.  If the state changes item/room availability, we want to resulting state to be in the set of states that reflects that new constraint.)

At this point, I want to step back and try to assess what I've just rambled on about.  By suggesting that the PDC be integrated into the designer, and then requiring that pieces of game logic be attached (in some sense) to the PDC, it seems like we could make an automated way to ensure that all game logic that exists in the game will continue to allow the game to get from beginning to end.  

When I watched the video of the TesterTron, it actually seemed like this was essentially what you were having it do, albeit it what looks like a randomized way.  (I'm guessing that the TesterTron is just picking a random interactable object in the current room and doing an action on it.)

So....that means that what I'm describing it basically a less ad-hoc approach to testing than you're already doing, that would probably require a huge time commitment to implement.  I guess that's when we file an idea under "eh...maybe, if I rebuild the engine from scratch, some day."

I ~do~ envision it as having the advantage that, as soon as you add a piece of game logic in the designer, it seems like it would be possible to have it detect when you have "unresolved state changes" that would need to be addressed to another piece of game logic.  So, if this works like I'm imagining it, you might be able to cut out time spent testing game logic.....especially since the TesterTron tests that aspect pretty slowly, since it's rendering as it goes.

Of course, that also means that what I've been describing does nothing to check for visual artifacts.  How to test that is much less apparent to me, short of something ML-driven like you have already thought of (with an image analyzer trained to identify unusual sections of sprites, I suppose), or perhaps putting more strict constraints on masks for render-behinds and stuff like that.  (This might go without saying, I'm pretty far out of my realm of knowledge, at this point.)

Having said all this, the one thing I feel like I have any confidence in saying is that there seems to be potential value in separating testing for visual artifacts from testing for game logic errors.

Anyhoo....there's my thoughts on the matter, if they're worth anything.

Tristan

Jan 05, 2018
As a backend dev who's a massive proponent of TDD, unit testing and automated testing, if you asked me to take a stab at what sort of testing would be needed for a game, I would immediately think that there's going to need to be a lot of manual testing.
There's humans involved, and game players are more likely to be inquisitive and to try to explore everything/break things.
A computer also cannot do things like check that what is displayed is correct.
And that's before you get to testing the actual game play and story.

That said, I would think that there is a lot of scope for unit and other automated tests - there are surely units of code which do not take user input or output directly. The behaviour of those units is easily testable. Likewise, there probably some larger behaviour which can be tested automatically, but like any project there will come a point where the cost is too great for the benefit (and that may come sooner with games).

I do like the TesterTron - but I can see lots of instances where it will fail to find bugs humans are more likely to find (and its less repeatable than my preferred automatic tests).

Ron Gilbert

Jan 05, 2018
I do like the TesterTron - but I can see lots of instances where it will fail to find bugs humans are more likely to find

Very true. It was good at finding what we call "one-frame bugs", bugs that happen in one frame after something else has happened. It's rare that human testers find them, and if they do, they are impossible to repro. It could also work through the night, randomely clicking on stuff. But it was just mostly fun to watch.

Saks

Jan 06, 2018
I test aircraft (certification and V&V, not unit testing, although the unit testing team was wall to wall with me), and of course it is mandatory. Whatever it costs it pays off.
On the other hand, in my experience, unit testing is not so much expensive. Usually scripts injecting data and writing logs.
Probably writing the script might take some time, and the tool to crunch the logs also, but the UT team usually spent more time doing functional testing than unit testing...
But of course, depending on your budget the effort might won´t be worth.

Jonathan Hartley

Jan 06, 2018
Someone said "No company has succeeded even with simpler applications such as individual apps in an office suite."

This is absolutely false. Many companies make products like that which are thoroughly tested, by which I mean way more than 100% coverage, both at the unit test level and at the functional level. I've worked at such companies myself.

Chris

Jan 06, 2018
> That's the whole point of this post... it's not where the bugs are. It's not an efficient use of my time. It might be for the type of programs you write, but not for games.

Ok, I was just surprised that only 1% of bugs were of this sort - for a multiplatform engine used in a bunch of games, this is great. I guess it's a case of your engine code being much better quality than the gameplay code! (I assume you write all the engine code yourself, where others can be contributing gameplay code)

- maybe the solution could be around how to expose gameplay programming in a way where these kinds of bugs are harder to create, or easier to catch, and maybe automatically being able to test each gameplay script and the edge cases created with each gameplay script... There are ways of doing static analysis on code which can find edge cases automatically which could possibly help for this kind of thing?

Roberto

Jan 18, 2018
Hey Ron,
Interesting thoughts. Do you think that the lifecycle or the size of TWP and similar games affects your choice?

That is, do you think unit tests would be more useful if you were writing a bigger game (e.g. FIFA Soccer) or a game with a commitment for a longer timescale (e.g. World of Warcraft)?

Greg

Jan 21, 2018
I'm a big fan of unit and integration tests, but that's because I work on server software that has multi-year lifespans. We have to be able to quickly add new features and deploy them with minimal time and cost to verify that nothing has broken outside the focus of new development.

It makes sense to me that something like a game would eschew a lot of this kind of testing. It has a completely different development lifespan. The exception would be a gaming middleware company. I expect Epic, Crytek, and those sorts of companies to have a comprehensive test suite in their codebase.

I have personally derived benefit from unit testing, in that it allows me to very quickly iterate and demonstrate that nothing has gone wrong. But for a game, the unit testing is likely only valuable in the engine component, but then you didn't experience many bugs there. On the other hand, that's retrospective - when I do unit testing, I'm trying to prevent regressions and prove my designs before I integrate them into the whole. But to counterpoint again, I'm working on teams of (depending on the product) anywhere from 4 to 25 developers. If I recall the Thimbleweed Park credits correctly, you didn't have to worry about crossing the streams with other developers very much. This mitigates the circumstance where other people are breaking your code without knowing it, something that unit tests help with.

Raphael

1d ago
Hey Ron,
I agree with you on unit tests in games :)

with some game genres it might be economical and at least give devs some peace of mind to have kind of automated UI tests like shown here
https://www.youtube.com/watch?v=W20t1zCZv8M
where you record controller input and replay it.

I also saw a short presentation this year at GDC showing off this: https://youtu.be/lJGBDFgOulY?t=4m9s
(IDE to create test cases:  recording user input + screenshots, then replay and test with image recognition ... kind of)

I'm planning at least to create some kind of simple smoke tests for my next project with recorded input ideally also integrated with an automated build system ...
Here are the rules for commenting.