AlphaStar: The Next Step in Machine Learning

Introduction

Previously, DeepMind had created AI for various games on the Atari 2600 console and for board games like Chess and Go. DeepMind created their AI by creating multiple agents to play against each other and learn from one another. When DeepMind's Go AI beat one of the top Go players in the world, Demmis Hassabis revealed that the next possible challenge was an unassuming game called StarCraft. DeepMind created their StarCraft II AI, AlphaStar, by giving their agents a certain number of games that real people played, in order to make the AI better imitate a human, and filtered out agents that learned to use certain undesired strategies. The AI continued learning from itself by playing against other agents. Click here!

Complexity in Atari vs Go vs Starcraft II

The board game known as Go has approximately 20 times as much possible action space compared to games on the Atari. StarCraft II on the other hand, has about 277 sextillion times as much possible action space compared to Go and has gaps in the information it provides players.

AlphaStar vs Professionals

StarCraft II is a real-time strategy game created by Blizzard Entertainment in 2010. Players choose to play as one of three unique races (Terran, Zerg or Protoss) and face off to destroy each other. The reason why it's a difficult game for AI is because of the need to manage economy and armies while using imperfect information about your opponent's actions. On January 22, 2019, Deepmind's StarCraft II AI played a series of showmatches against two professional StarCraft II players. I will be refering to them by their screen names: TLO and MaNa. The way the AI played the game was via an API that DeepMind created with Python. This API allows bots to input commands which would then be issued to units in the game. This leads to the AI having superhuman precision, the ability to see and interact with anything on the battlefield in an instant and being able to create inputs at a superhuman rate. AlphaStar was told to restrict its actions per minute (APM) to an average of 280 while MaNa and TLO had 390 and 678 average APM respectively. AlphaStar used a loophole in its APM restriction and stayed very inactive, keeping its APM extremely low and burst to a high of over 1000 APM when it entered battle, causing its average APM to stay low while still being able to play at superhuman levels. AlphaStar's extreme precision also gave it a huge advantage against the other players. All of its actions were inhumanly precise meaning AlphaStar did not create unnecessary actions or misclicking, making its average EPM almost equal to its average APM. Meanwhile, professional players usually play with anywhere between half to a third as much EPM compared to their APM because of natural unnecessary clicking or misclicking while playing the game. Overall, people had the general consensus that AlphaStar played too much like a robot instead of like a human. Viewers were also dissapointed that AlphaStar could only play on one map, using one race (Protoss), against one race (Protoss). AlphaStar won all five games against TLO and lost only one of five games against MaNa. Click here!

An example of how the API can create inhuman inputs.

The API created by DeepMind for AlphaStar does not account for humanly impossible moves. The AI attempted to land a building right next to resources, which is impossible for players to do.

AlphaStar vs the World

In July 2019, Blizzard announced that it would be releasing AlphaStar agents to the StarCraft II European ladder to gather data. There would be three agents active at a time, one representing each race. These new agents perform less APM, making it similar to a human player, but still had some obvious quirks, such as consistently having EPM equal to its APM. Even without knowing about the quirkiness of AlphaStar's APM/EPM, players also identified AlphaStar by noticing that they never used control groups. Control groups allow the player to save a selection of units or buildings in order to move/attack/build with them later. AlphaStar, due to its frame-perfect actions, has no need for control groups and instead switches its camara to its base for a few frames ten instantly moves ites camara back to the main battlefield. Another quirk of AlphaStar was that it never scouted the opponents for information. The AI likely picked up this bad habit due to its lack of need for scouting. Because its ability to manage economy perfectly and queue up units and control them at superhuman levels, it never learned that scouting could be advantageous which results in each agent only using one strategy. The agents ended up being found and was experimented on by players to see what strategies before they were replaced by better agents. Out of the strategies tried, players found that the best ways to beat the AI was by: applying early pressure as AlphaStar has poor early game defense, limiting AlphaStar's control of the map (via Planetary Fortresses, Missile Turrets , Nydus Worms or Photon Cannons), or simply by building units that directly counter AlphaStar's as the agents can not change strategies in the middle of a game. Click here!

AlphaStar traps its own Siege Tanks in its own base due to poor building placement.

AlphaStar still has a few kinks in its strategies. This is especially true in the original Terran agent that could be played on the ladder. AlphaStar does not understand how to place buildings which causes it to trap its units inside its own base. This causes the AI to panic, usually by destroying its own buildings.

AlphaStar: Beating Humans with Machine Learning

Introduction

AlphaStar vs Professionals

AlphaStar vs the World