With No More Human Opponents, AlphaGo Kicks Its Own Ass 

This week, Google announced a pretty remarkable breakthrough. According to Slate:

The newest version of [Google’s] Go-playing algorithm, dubbed AlphaGo Zero, was not only better than the original AlphaGo, which defeated the world’s best human player in May. This version had taught itself how to play the game. All on its own, given only the basic rules of the game. (The original, by comparison, learned from a database of 100,000 Go games.) According to Google’s researchers, AlphaGo Zero has achieved superhuman-level performance: It won 100–0 against its champion predecessor, AlphaGo.

Almost as impressive, AlphaGo Zero did it using fewer computer chips, aka TPUs:

Early AlphaGo versions operated on 48 Google-built TPUs. AlphaGo Zero works on only four. It’s far more efficient and practical than its predecessors.

Maybe estimates of mass unemployemnt in 20 years are feeling like a pretty reasonable forecast (even though nobody really knows for sure).


UPDATE:  so this is a little creepy.  From the Deepmind blog:

After just three days of self-play training, AlphaGo Zero emphatically defeated the previously published version of AlphaGo – which had itself defeated 18-time world champion Lee Sedol – by 100 games to 0. After 40 days of self training, AlphaGo Zero became even stronger, outperforming the version of AlphaGo known as “Master”, which has defeated the world’s best players and world number one Ke Jie.

3 days to achieve a higher level of mastery of a complex game than most humans have, without any tactical or strategic advice, just being told the rules?  That’s pretty wild.  What this means is that if you can create an accurate enough simulation of a real world problem, an AI has a good chance of being able to master it on its own in a matter of days or months (the AI needs a simulation so it can learn by playing a bazillion games against itself).  

This won’t be true for all problems.  For example, in Go you only move one piece at a time, the environment is far less complex than, say, driving, and it may have real world hardware problems — e.g. sensors that don’t work well in the snow, or not being able to process data fast enough to react quickly enough to unexpected problems — that may limit what it can do.  But it’s still a pretty impressive/creepy accomplishment.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: