Age of Mythology: Extended Edition

Age of Mythology: Extended Edition

This topic has been locked
morness Mar 23, 2016 @ 12:15pm
Patch 2.2 & 2.3 Desync Discussions (only desyncs)
Background Information
Age of Mythology, like the vast majority of RTS games, only pass a player's inputs to everyone else. That means selections, attack orders, building things, etc. The state of units such as their positions and health are never transmitted. This is why you can play a multiplayer game with so many units. The downside is that everybody playing the game has to recreate the simulation perfectly, known as determistic simulation in a Peer to Peer network architecture.

And that is the hard part and the biggest negative tradeoff. When things go out of whack, the game is able to detect it eventually triggering a desync. Without the detection, all players could theoretically win the game and it would feel like you were playing against a really dumb opponent. In order to implement this data, key data is checked and compressed into a single CRC check -- basically 1 simple number that represents the encrypted state of your game. That value must match every other player in the game, and it is passed to the other players. And it doesn't necessarily do this every frame, it can perform the check once and a while as a periodic sanity check. So if the game desyncs, it could have actually happened several seconds ago.

As developers, we're able to make debug builds with extra diagnostics in place. These diagnostic tools generally slow performance down and require more memory to function properly. Desync diagnostics are one such set of tools. The problem we've had in the past with live builds is that the desyncs we encounter have virtually no data.

Desync Diagnostics in Patch 2.2
For the first time ever, the full suite of desync diagnostics have been exposed to a final release build. But we had to be super careful not to interfere with performance, so the diagnostics were effectively rewritten from scratch, so let's go through how it used to work and how it works now.

Old way
The original diagnostics would sync data to 25 different categories. For example, random number generation syncs to the RandSync category, while unit's health and positions syncs to the UnitSync category, and so on. Most of the time, these categories were turned off, but there was always one category that ran in the background called the FinalReleaseSync which only checks a sparing amount of information to keep final releases nice and fast. Unfortunately, when the game desyncs, we basically have no idea what triggered it or even when.

Also the game only checked for desyncs every 128 frames, which was roughly once every 4 seconds. So it was hard to tell what you were doing 0-4 seconds ago after a desync.

When a desync was encountered, categories would get turned on to try to zero in on the cause, but that would result in truly horrific performance and massive data logging, and this was all hardcoded in the code, so making any changes required making a new build for people to try to desync on. For example, turning everything on made the game run like a slide show (1 frame every 2-3 seconds), and logging was approximately 1GB per second! Completely miserable experience.

So only turning on a few categories would run slow and log less data, so there are definite tradeoffs.

New way
The category logging system was quite nice, so we kept that, but made several upgrades:
  • The sync logging parameters have been exposed to a config file (startup/desync-tracking.cfg). If you open the file up in a standard text editor (notepad), it should be pretty self explanatory and well documented.
  • If someone alters the sync logging parameters in the config file, then when they host a game, all that information is propagated to the other players. This was to eliminate human error and the efforts of coordinating games where everyone had the same config file.
  • Instead of logging data to a file that grows during gameplay, instead that data is sent to a memory buffer divided by logic frames. Writing to memory is much faster than writing to a file. In addition, because this data can get so large, we have to limit it's 'memory'. When a game desyncs, we don't need the whole history, just the part at the end and a few frames before the desync, so we only remember x number of frames of history, deleting the oldest history as we go. This is one of the settings in the config file. Anyways, at the end of the game, this file will be written out in the logs folder, starting with SyncLog and a timestamp.
  • There are some protections in place, if you make the logging too aggressive. If you say set it to remember 10,000 frames of history with every category turned out, you'll quickly run out of memory. So as memory becomes scarce, it'll automatically reduce the frame history as needed.
  • Finally the SyncLog files that get generated after each game record key information about mods, player settings, sync-log settings.
  • When you start the game, it now only keeps the 10 most recent log sessions (so we don't fill up your harddrives).

Why are you telling me all this? Who cares?
There are many hardcore gamers here, and it's probably only those people that actually read this far. The mod community has continued to impress us with their technical expertise, in many ways, with more knowledge about this game than we have, though in a different way. Regardless, we still have one desync in the wild, one that we have failed to reproduce internally but seems to happen in the real-world. We don't know why yet, we just know it may have something to do with resigning.

So if you are still reading, my hope is that just a small group of you that know each other or game together on a regular basis can try to capture this desync with some of the tracking tools and help narrow down the cause. This leads to the next part... how to diagnose a desync.

How to diagnose a desync
First the host of the game, needs to organize a group of people that can actually send log files to each other, which rules out random games with random people. The host has to have turned on a few categories in the desync-tracking.cfg file before running the game. All you have to do is enable the system, and the rest of the defaults are good to go -- a good balance between information and performance. The game will run a little slower.

Then play the game in a way that you think you can trigger a desync. Once it desyncs, everyone will go back to the main menu. Quit the game and then open up your log folders.

The log folders are accessible by your Steam Client > Right click on Myth > Properties > Local Files > Browse Local Files > Then open up your Logs folder for the latest time stamp.

Send them to the designated person or if you prefer, set up a google drive share, where everyone can upload their files. That way each person can grab everyone else's quite easily.

At this point, one would use a compare program such as Beyond Compare to "diff" the two files. And hopefully it should point to the key differences. Some of the stuff, such as mods may or may not influence the desync, but you'll have to play it by ear.

In any case, posting a share to the files would be ideal and then I can take a closer look at them. Most of the time, I would need to add additional sync points to narrow down the cause and generate a beta branch build to try to repro it again.

We have also added a new commandline option:
+selectedUnitInfo
Gives details about currently selected unit including location information. This is super useful when watching replays of games that desynched once you identify the triggering unit.

Replay desyncs
If anyone wants to familarize yourself with this system before involving other players, you can actually sync log single player games. It will generate a log when you play a game, and it will generate a separate log when you watch the replay. Then you can compare those two logs. This is what we use to get to the bottom of replay desyncs or version problems between patches.

That's everything I could think of off the top of my head. Happy to help or even host games.


Known Issue: Sync Logs don't generate with scenarios
In order to current see sync logs, you have to play randomly generated maps. The system was based on the recorded game system, and recorded games aren't generated for scenarios. I should be able to fix this easily, because it's usually these scenarios that tend to desync the most given some crazy battle scenarios I've seen.

Last edited by morness; Mar 24, 2016 @ 3:26pm
< >
Showing 1-15 of 18 comments
morness Mar 23, 2016 @ 1:52pm 
Btw, if anyone posts anything outside of desyncs, I'm just going to quietly delete those posts. I normally don't do this, but this is the most important thing right now and just want to keep the focus.
Donald Trump Mar 23, 2016 @ 2:13pm 
Made a few games and there were no barrage godpower desyncs (before, it was a desync about one out of two times... seriously!) ! I'm loving it ! Nice job !
Ender Mar 24, 2016 @ 6:15pm 
Sounds great! We get a desync every single time we play with each other (always over the internet), so we we'll make sure to turn on the logging and get it recorded! Thanks for continuing to look into this!
(You can delete this post now :) )
Last edited by Ender; Mar 24, 2016 @ 6:16pm
dogdaddonga Mar 24, 2016 @ 8:47pm 
So its not fixed? FFFFFFFFFFFFFF
morness Mar 25, 2016 @ 7:16pm 
Originally posted by AusGabeNewell:
So its not fixed? FFFFFFFFFFFFFF
We've fixed 3 out of 4 desyncs over the past month or so, hopefully just this one left. We also have metrics now to detect how often this is happening to each person.
Donald Trump Mar 25, 2016 @ 9:40pm 
Only desync remaining is the resign one that can happen quite often ! Other than that, it's a please to play games that last over 30+ mins ! Thank you morness !
smitske Mar 26, 2016 @ 3:54am 
Originally posted by morness:
Originally posted by AusGabeNewell:
So its not fixed? FFFFFFFFFFFFFF
We've fixed 3 out of 4 desyncs over the past month or so, hopefully just this one left. We also have metrics now to detect how often this is happening to each person.

3 out of the 4 you think there are because lets face it skybox doesnt know all that much about the game itself and is not informed too well.
morness Mar 28, 2016 @ 2:34pm 
We just finally captured the resign desync in the logs and have a good idea of what's happening. Somehow the resigned player when he quits is still connected to the MP game, there he cleans up the world and everyone else doesn't... thus the desync.

Basically the thing that needs to happen is to ensure resigned players or observers cannot trigger desyncs (other players should ignore them). This one is going to take some time to deal with.
-Dare Devil/x/ Mar 28, 2016 @ 2:56pm 
Originally posted by morness:
We just finally captured the resign desync in the logs and have a good idea of what's happening. Somehow the resigned player when he quits is still connected to the MP game, there he cleans up the world and everyone else doesn't... thus the desync.

Basically the thing that needs to happen is to ensure resigned players or observers cannot trigger desyncs (other players should ignore them). This one is going to take some time to deal with.

-So that time put into those logs are really paying off. I'm glad the desync issue is finally back on track.
dogdaddonga Mar 29, 2016 @ 12:31am 
So its fixed?
smitske Mar 29, 2016 @ 1:02am 
Originally posted by AusGabeNewell:
So its fixed?

No they figered out it exists and what caused it.
morness Mar 29, 2016 @ 2:53pm 
We have an update. We've been able to reproduce this three times so far. The first two times we had to have at least someone overseas to play. The third time we got it desync we simulated latency on one of our internal machines. Basically there seems to be a timing issue when someone resigns and becomes a spectator, but it doesn't trigger until they quit. We've been getting an increasing quality of logs each time, so narrowing it down now. Still going to take some more time, possible days to identify and fix.

So basically to repro -- we play a 6 player game with 4 humans and 2 AI, random teams. When a human resigns, it can desync when he quits... but only if someone else remaining has either latency or performance issues. The sync validation system isn't catching the removal of the resigned player. Getting closer....
Last edited by morness; Mar 29, 2016 @ 3:06pm
Me>You May 7, 2016 @ 7:47pm 
Why does it desync everyone at the same time? I always get desynced when someone leaves. Not sure how people leaving can affect a game and cause EVERYONE to desync.
smitske May 7, 2016 @ 11:33pm 
Because of shoddy code. Although this one for once isnt on fully on skybox.

They say they have a fix ready so we will see.
-Dare Devil/x/ May 7, 2016 @ 11:38pm 
Originally posted by Me>You:
Why does it desync everyone at the same time? I always get desynced when someone leaves. Not sure how people leaving can affect a game and cause EVERYONE to desync.

-This is currently fixed in the beta branch.
< >
Showing 1-15 of 18 comments
Per page: 1530 50

Date Posted: Mar 23, 2016 @ 12:15pm
Posts: 18