This tool is designed to help domain experts find bugs in an RL agent, where the domain is a custom StarCraft 2 (SC2) game. The agent uses multiple neural networks to achieve different predictive capabilities. The tool exploits information persisted by the agent to help find issues with two of the learned models. The agent has to learn to operate in a huge action space, where there are a number of strategic concerns. Two use cases are covered in the demo video:
- finding the bad decision that caused the loss of a game, and
- finding reasoning bugs by leveraging game-wide summaries of certain information.
This demo video demonstrates that adding an After Action Review workflow can improve the usefulness of an analytics tool designed to help find bugs in a complex reinforcement learning agent’s reasoning. Though built around a particular domain, the AARfAI benefits are believed to be domain-independent.
The state transistion model predicts game state at the next decision point, given the current game state, a friendly action, and an enemy action. The state information includes which unit-generating buildings are possessed by the friendly and enemy agent, how many income-generating units are possessed by each agent, and what units are on the field in either battlezone lane, binned into four grid-squares per lane. The action information includes which unit-generating buildings are purchased at each decision point (decision points every 30 seconds). Game action is otherwise controlled by the default SC2 engine.
The leaf evaluation model predicts the likelihood of an eventual win given a particular game state. Inputs are the state information as described above.
The agent was trained by playing against another AI agent until it would usually win. Then it would train against a copy of itself until it would usually win. This process was iterated a number of times.
This analytics UI was developed to work specifically with the custom SC2 Tug-of-war game.