When building games, you end up dealing with a LOT of data. Back in the day, games were simple enough that you could put Mario’s jump height or run speed directly in the code, but the sheer volume of data required to build even a relatively simple game by today’s standards makes this practice untenable. A single type of zombie in World Zombination requires over 50 different properties. To make things even more complicated, our units level up and get stronger over time in different ways: some do more damage, others move faster, others have a larger awareness radius, etc. We have hundreds of different art assets—models, textures, animations, sound effects, and particle systems—each with their own constraints on when and where they can be used.
With so many numbers and files floating around, it’s easy to get overwhelmed. Development can slow to a crawl as you try to figure out why Boston is showing up in the middle of the ocean on the world map, why the Medic AI is killing teammates instead of healing them, or why zombies look like fire hydrants.
Your tools for creating game data ideally would have the following traits:
- Easy to Write. Game development teams are multi-disciplinary. You shouldn’t have to be a coder in order to add or change game data. If you enter something that isn’t understood by the tool, the author should be quickly notified in a friendly, helpful way.
- Easy to Read. If you’re debugging a problem or wondering why something in your game isn’t acting the way you expect, you should be able to easily search and inspect the game data to figure out what’s wrong. You should also easily be able to compare values of different types of game objects.
- Well-defined. There are lots of different types of game data – integers for things like hitpoints, decimal numbers for things like velocity, text strings for dialog, as well as arrays, maps, and structured data. Each type of game object should have a set of typed properties that it expects. This collection of schemas and constraints forms your data model, and your tool should expose this data model in an understandable way.
- Verifiable. As a game’s complexity grows, it becomes harder and harder to test. Dependencies and relationships between game objects can become circuitous in even the best-designed systems. Artists and designers will want to know as soon as they can that they did something wrong (e.g., entered a floating point number when an integer is expected, specified a path to an art asset that doesn’t exist, etc.). In order to keep development from slowing to a crawl, you shouldn’t have to run the game to find out you entered bad data. Similarly, if new constraints are added by engineers, they should be able to validate all existing data.
- Reusable. Lots of types of game objects can share some common traits. For example, all zombies in our game pursue humans, they collide with walls, and so on. Your tools should allow you to specify this data once, instead of dozens (or hundreds) of times. (This is analogous to the DRY principle in engineering.)
- Extensible. It should be easy to add new types of data or change existing data in bulk. Game systems will change many times over the course of development; properties are added and removed, and new dependencies are introduced. It should be a straightforward process to migrate existing data as the underlying game systems inevitably change during development.
Today I’m going to talk about some common patterns, and then talk about our solution for World Zombination. Every game studio I’ve worked at had a different set of tools for managing game data, each with their own strengths and weaknesses. It is important to note that there is no one-size-fits-all solution: most games these days have multiple approaches that make sense for different types of data.
Solution 1: Plain old Text
The easiest solution to build is to simply put all this data in a text file, and parse it at runtime. Games shape with a .ini file that includes all this data, which is read when the game is launched.
run_speed = 10
hit_points = 5
welcome_dialog = Welcome to the Thunderdome!
- Easy To Write: 4/5. Anyone can write a simple text file like this in Notepad. But a syntax error such as a missing newline or “=” can still cause gotchas for non-technical folks.
- Easy to Read: 2/5. It’s easy to understand what each property means in this example, but as the number of properties in a game grows, this solution can quickly become impractical. It is not apparent where these values will actually be used by code, requiring engineers and content authors to be in very close communication. Finally, it isn’t very easy to compare the values of different properties unless strict organization is enforced inside the file.
- Well-defined: 0/5. The flat structure means that it’s hard to tell where or if a property is going to be used. Without any kind of hierarchy, you have to rely on things like naming convention to group related values together. This solution also quickly falls apart once we desired more complicated datatypes like arrays or maps.
- Verifiable: 1/5. Engineers can write tools that will validate the data, but this requires they write validation code for every new property that is introduced. Furthermore, content authors won’t get any realtime feedback that the data they entered is wrong.
- Reusable: 0/5. Without any concept of hierarchy, it is hard to reuse data without explicit engineer involvement.
- Extensible: 2/5. The simplicity of this approach makes it easy to do things like bulk-modification via search-and-replace in a decent text editor or simple scripts. More complicated changes will likely require an engineer’s involvement.
Solution 2: Custom Tools
A common solution is to build custom editors for your game. Engineers will build editors for each type of game object, so artists and designers can easily enter data. A well-built editor can also ensure in real-time that entered data is valid, as well as provide nice visualization of complicated data structures. This is generally how games are made in Unity; they provide a simple API for engineers to build custom editors that you can use within Unity itself.
- Easy to Write: ?/5. This all depends on how well-built the editor is. For example, Unity provides a lot of built-in controls for things like dropdowns, checkboxes, and even arrays and bitmasks. However, once you get beyond simple data-entry forms, this tends to become a much bigger engineering effort.
- Easy to Read: 3/5. A well-designed editor suite should allow you to quickly browse through the database of game content, and search for objects that you are looking for. However, doing more advanced searches or comparisons between objects can be cumbersome. This is particularly problematic if the file format the editor writes is binary, making it impossible to do things like search for values.
- Well-defined: 4/5. Engineers can make the editors fit the data model of the game. Unity makes this almost automatic in many cases, by exposing all public member variables of a MonoBehavior directly in the editor. However, it still requires some engineer effort – making sure the right data is exposed to the editor, as well as handling more complicated data structures that Unity doesn’t provide built-in support for, like graphs, tables, or maps.
- Verifiable: 4/5. Engineers can write validation code that executes inside the custom editor, giving content authors real-time feedback when they enter incorrect data. Additional engineer effort may be required in order to validate existing data when new constraints are added.
- Reusable: ?/5. Custom editors would ideally provide a way for content authors to reuse data. Unity has limited support for this via Prefabs, but there isn’t an out-of-the-box solution for inheriting data from another Prefab, or even nest prefabs.
- Extensible: 1/5. Unless custom editors are specifically built for bulk-modification, you’re often going to be stuck changing each game object one-by-one if the underlying system changes. This is a particular issue with Unity.
One common mistake I’ve seen made with custom tools is to try to reinvent the wheel: don’t try to remake Excel or Visio as a custom editor. Unless you have a massive engineering team, you’ll end up with a far inferior version.
Solution 3: Structured Text
There are two common text formats in use these days for structured data: XML and JSON. Both have their relative strengths and weaknesses (which I won’t go into), but they both serve a common purpose and libraries exist for reading/writing both in most programming languages.
<hitpoints> 10 </hitpoints>
<damage> 5 </damage>
<run_speed> 4.0 </run_speed>
<hitpoints> 20 </hitpoints>
<damage> 10 </damage>
<run_speed> 1.0 </run_speed>
- Easy to Write: 1/5. Try asking your average game artist to write some XML: I dare you. Both JSON and XML have very strict syntax, and most available parsers give very little feedback on what went wrong. (I recommend JSONLint or xmlvalidation.com if you want more helpful error messages.) There are tools out there that can provide more customized editors given a schema, but are generally pretty cumbersome and limited.
- Easy to Read: 2/5. Both file formats are definitely easier to read than write, assuming proper indentation. There are JSON and XML browsers available for navigating hierarchies. Since the data is all text, it’s easily searchable. However, comparison of different properties can be very difficult, since they’re spread out all over the file.
- Well-defined: 4/5. Both file formats are flexible enough to allow you to encode complicated data structures, and there are tools such as XML schemas that can be used to document and enforce datatype constraints. However, these schemas generally need to be written by engineers and kept in sync with code.
- Verifiable: 3/5. Schema tools can make powerful validation of data, but it requires engineers to write and maintain good schemas. There is no real-time validation of data, so custom tools must be provided to read in the XML and validate it for authors. This step can often get skipped by artists or designers who are trying to hit a deadline (or just lazy), meaning that bad data gets added to the game.
- Reusable: 2/5. Neither JSON or XML have an direct concept of data inheritance, but it’s possible to encode that yourself in your schema.
- Extensible: 3/5. It is easy enough to do search-and-replace type operations, though the syntax can sometimes require regular expressions in order to find what you’re looking for. XSLT is an XML transformation language (itself written in XML), allowing engineers to write complicated migrations quickly.
One solution that I’ve seen used successfully is to build custom tools for artists and designers to enter data, and save that data out as XML/JSON. This gives some of the advantages of both approaches.
Solution 4: Spreadsheets
Excel is an extremely powerful tool in the right hands. It is designed to handle large amounts of data, and provides a powerful editor and a lot of tools to help you out. This has been the tool of choice for bulk data entry for many games I’ve worked on.
Excel of course uses a row/column table layout. You can create spreadsheets where each row is an object in your game, and each column is a property for that object. Add pretty formatting and fancy formulas as desired. You can import/export comma-separated text files, and the modern Excel file format is XML, making it relatively easy to write your own file reader (assuming you don’t care about formatting and such).
- Easy to Write: 5/5. Pretty much everyone already knows Excel, or can learn it in a day. Even artists can use it (if you bribe them)!
- Easy to Read: 3/5. The tabular format makes comparison of values between different objects very easy, and powerful formatting and graphing tools can help you visualize the data in a lot of different ways. This makes Excel extremely powerful for data that is tabular in nature: reward tables, XP tables, and so on. However, as game data grows, you end up with lots of different spreadsheets with relationships between them, which can be hard to search and navigate.
- Well-defined: 2/5. Excel falls apart when you have data that isn’t easily represented by a table: arrays, maps, and other hierarchical structures. Some of this can be alleviated by providing some simple syntax within cells (for instance, using commas to delimit array elements), but this definitely goes against the grain of Excel and can stymy efforts to bulk-modify or visualize data.
- Verifiable: 4/5. Excel allows you to constrain rows to different datatypes, prohibiting entry of invalid data. You can even set up complicated custom formatting rules and scripts for real-time feedback.
- Reusable: 2/5. Cell references in Excel can allow you put common data in a single cell, and formulas are extremely powerful at encoding complicated relationships between cells. You can even have references to cells in other spreadsheets, but you generally need to have both spreadsheets open for this to work. Support for data inheritance when using lots of spreadsheets is limited, at best.
- Extensible: 3/5. Excel has great tools for manipulating spreadsheets efficiently. The problem comes in when you need to do bulk modifications across multiple spreadsheets. The game content can explode out to hundreds of spreadsheets, and this can be a pain when you need to add a new column or change the constraints on an existing column.
One big disadvantage of Excel is that it is not friendly with teams—you generally need to ensure that only one person is modifying a file at a time—which can cause big problems depending on your source control solution. Google Docs provides a much better multi-person workflow, even giving you real-time updates as cells are changed, but it is overall not as powerful as Excel.
Another recent spreadsheet-based solution is CastleDB, which is geared toward game development in particular. Some of the advantages of CastleDB are things like automatic validation based on a data model, visualization of things like images, and storage into XML or JSON for easier searchability/extensibility and source control integration.
Our solution: Data Template Format (DTF)
We knew we would have lots of content in World Zombination, and were tired of the pain points with the other solutions. We wanted to have the easy searchability and bulk modification of text files, but without all the strict syntax of XML and JSON. Excel was considered, but we needed hierarchical datatypes that would be hard to represent well in a spreadsheet. We ended up creating a parser and compiler for our own custom language, DTF.
DTF files describe one or more templates, which represent the data for a type of game object. (This is analogous to Prefabs in Unity.) Templates have one or more components, each with their own set of properties and corresponding values.
In DTF, the hierarchy is inferred by indentation in the file, ala Python. The language supports all simple datatypes, as well as more advanced types like arrays, maps, and structs. Properties are specified via simple “name : value” notation.
shape : Circle(0.5)
collisionGroups : [ "building", "unit" ]
walkAnimationBaseSpeed : 5.0
visionRange : 60
movementSpeed : 4.5
aggroTypes : [ "human", "human_building" ]
Unlike Unity, we allow template definitions to inherit data from other templates, similar to how inheritance works in object-oriented programming. The previous example was an abstract template, meaning that it cannot be created at runtime, and it may not specify all required properties for all components. Derived templates can override the value of inherited templates. (The DTF language makes overriding a property explicit via the “override” keyword, so you know when you’re changing something defined in a base template.)
concrete zombie_brute : zombie_base
override shape : Circle(1.5)
resource : "art/3d/zombie/zombie_brute.prefab"
override movementSpeed : 2.5
concrete zombie_runner : zombie_base
resource : "art/3d/zombie/zombie_runner.prefab"
override movementSpeed : 5.5
DTF works via a compiler that reads in all the text files in a directory, and spits out a binary data format. One of the most interesting things about the compiler is that it includes all of our game code inside of it. The compiler uses reflection to determine what components are available, and the name and datatype of each property available in each component. Once an engineer marks a component member variable as accessible, it is automatically available in DTF. If someone enters a property that doesn’t exist on a component, or is the wrong datatype, then the compiler emits an error. Similarly, if a concrete template doesn’t specify all required properties for a component, the compiler emits an error. The advantage to this is that the format of our templates is automatically inferred from code, so there’s no need to write custom readers/writers or have some separate schema definition.
After parsing the DTF files, the compiler will construct the entity and components, call any custom validation functions defined in code, and then serialize the entity to a file. At runtime when creating an entity, the ID of the DTF template is passed in, and the corresponding entity is unserialized from disk and returned.
- Easy To Write: 3/5. The simple syntax of DTF makes it easy to learn, but there is still the possibility for syntax errors. We believe this accomplished the goal of being easier to write than JSON or XML.
- Easy to Read: 3/5. The syntax of DTF makes it very easy to understand, even if you’ve never seen the language before. Searchability is better than JSON or XML. Comparing different properties is still no ideal, however: it is not as good as Excel at handling naturally tabular data.
- Well-defined: 4/5. The data model for the DTF file comes directly from the code engineers write, and it supports complicated data structures like arrays, maps, and enums.
- Verifiable: 5/5. The compiler ensures that all required properties are defined for a component, and that they are the correct datatype. References between templates are also validated to ensure the referenced template exists and is the proper type. Metadata can be added in code to provide additional validation to properties, and custom validation code can be added that is run when the compiler is executed. In our solution, the compilation step is required in order to build the game, so content authors are unable to skip the validation process.
- Reusable: 5/5. The data inheritance model is built in to the language, and even supports more advanced concepts like parameterization and mixins (not covered here). We have found this hugely advantageous for organizing and sharing data amongst different varieties of objects in our game.
As I mentioned before, there is no one-size-fits-all solution to game data. DTF has some of the weaknesses that other text-based approaches have, when compared to Excel. The ease of data entry and power of Excel for things like bulk data modification, formulas, and graphs still makes it a superior tool in some ways. In particular, we found that our treasure and unit advancement tables to be far easier to work with in Excel than in text, so we ended up writing some simple import/export scripts to move data back and forth.
Additionally, there were a couple types of game objects that were used primarily by artists, and we decided to write a custom editor for them to provide better visualization and real-time validation (see the Encounter Editor screenshot above).
We’re pretty happy with DTF so far, but are still finding Excel to be more useful for some data in our game. What I’d like to do in the future is make it a lot easier to import and export DTF templates to Excel. The process would work like so:
- Using a SQL-like language, query the DTF files to retrieve the data you’d like in a tabular format (this is also a useful tool to quickly find data you’re looking for when you have lots of DTF files.) In order to handle hierarchical data in DTF files, we provide some directives in the query language to allow you to specify how things like arrays and maps get flattened into Excel columns.
- Import this data into Excel and do whatever you want.
- Export this data from Excel back into DTF. The table’s headers indicate how the data will be structured when reimported.
We think this will be a best-of-both-worlds solution – combining the flexibility/readability/reusability of our custom language with the power of Excel when you need it.
We also plan to eventually open-source DTF, so that it can be used in a platform/game agnostic way. Right now, the compiler is tightly integrated with our component model and Haxe, but in the future we would like to provide other language bindings. In addition to the compiler’s binary output format, we would also like it to output JSON/XML, which will be easier for other games to read. Stay tuned for more developments!
-Dan Ogles, CTO