[…] attempting to distil intelligence into an algorithmic construct may prove to be the best path to understanding some of the enduring mysteries of our minds, such as consciousness and dreams1.

We can theorize that one way to model human intelligence artificially is to literally reproduce the human mind. That includes the structures, the communication methods, and the way it learns. One model that describes that process is reinforcement learning.

For an example, think of the way a child might first figure out how to walk. Learning to walk is a long-term process, and the exact steps (pardon the pun) to get there aren’t obvious to a child that’s never done it before – it just looks like something people do to get to where delicious things might be faster. So eventually, through trial and error, and appropriate rewards (apple sauce! toys! shiny things to immediately shove up nostrils, speaking from my own experience!), kids figure out how to walk.

Because successful walking is presumably exhibited by growner-up members of the household regularly, the child might know whether what they’re doing is close to walking – the kid might even get increasing excitement from said members as they approach bipedal existence (so, the relative reward of getting ever closer to walking increases appropriately).

However, they have no instructions on what successful walking actually is (that is, no videos or kinesthetic information of them walking). In this way, reinforcement learning differs from other machine learning algorithms that first have to be trained on a set of correct input/output pairs.

Google DeepMind used reinforcement learning to train their AI to play Go. They call it, originally, Alpha Go.

So the way we start off training Alpha Go is by showing it a hundred thousand games that strong amateurs have played, that we’ve downloaded from the Internet. And we first initially get Alpha Go to mimic the human player. But of course ultimately we would like Alpha Go to be stronger than human amateurs and compete with the top professionals. So the way we do that is after we take that first version that has learned to mimic human play we then allow it to play itself 30 million times on our servers. And using reinforcement learning, the system learns to improve itself incrementally through avoiding its errors and increasing and improving its win rate against older version of itself. And after all these games, then you end up with a new version that can beat the old version, the original version about 80 or 90% of the time.

Alpha Go is model-free, which means it doesn’t need any initial structure. All it really needs is an observable environment with determined states, a set of actions or moves it can perform, rules governing those moves and relative rewards per move made. So, after beating the crap out of Lee Sedol, one of the best human Go players, DeepMind proceeded to make AlphaGo play various other video games – like Space Invaders, and more recently, DOOM – to see if its AI can learn to kick ass at those too. And guess what – it did.

All in all, we may be at a historic point – a critical point in artificial intelligence. The future could see deep neural learning and general-purpose AI in our smartphones and computers. Some say that we might even have to start thinking of a whole new economy, reason being AI could take over a lot of the information-intensive, specialized jobs, leaving only jobs that are creative or emotional in nature, something that AI might never achieve.

And while one cannot currently refer to Alpha Go as general-purpose, the fact that its abilities generalize between different paradigms is really interesting, exciting, and potentially scary. This is something I’m looking forward to exploring further.

Join Me!

Get weekly updates on brain tech, fitness, work, and other fun! All summary of learnings, zero spam.

  1. “Is the brain a good model for machine intelligence?” http://www.gatsby.ucl.ac.uk/~demis/TuringSpecialIssue(Nature2012).pdf