Google’s DeepMind uses ‘dreams’ to learn

Industry

Published: 20 Nov. 2016, 20:12

Google’s DeepMind uses ‘dreams’ to learn

Androids may not, as science fiction writer Philip K. Dick once posited, dream of electric sheep. But the newest artificial intelligence system from Google’s DeepMind division does indeed dream, metaphorically at least, about finding apples in a maze.

Researchers at DeepMind wrote in a paper published online Thursday that they had achieved a leap in the speed and performance of a machine learning system. It was accomplished by, among other things, imbuing technology with attributes that function in a way similar to how animals are thought to dream.

The paper explains how DeepMind’s new system - named Unsupervised Reinforcement and Auxiliary Learning agent, or Unreal - learned to master a three-dimensional maze game called “Labyrinth” 10 times faster than the existing best AI software. It can now play the game at 87 percent the performance of expert human players, the DeepMind researchers said.

“Our agent is far quicker to train, and requires a lot less experience from the world to train, making it much more data efficient,” DeepMind researchers Max Jaderberg and Volodymyr Mnih jointly wrote via email. They said Unreal would allow DeepMind’s researchers to experiment with new ideas much faster because of the reduced time it takes to train the system. DeepMind has already seen its AI products achieve highly respected results teaching itself to play video games, notably the retro Atari title “Breakout.”

“Labyrinth” is a game environment that DeepMind developed, loosely based on the design style used by the popular video game series “Quake.” It involves a machine having to navigate routes through a maze, scoring points by collecting apples.

This style of game is an important area for artificial intelligence research because the chance to score points in the game, and thus reinforce “positive” behaviors, occurs less frequently than in some other games. Additionally, the software has only partial knowledge of the maze’s layout at any one time.

One way the researchers achieved their results was by having Unreal replay its own past attempts at the game, focusing especially on situations in which it had scored points before. The researchers equated this in their paper to the way “animals dream about positively or negatively rewarding events more frequently.”

The researchers also helped the system learn faster by asking it to maximize several different criteria at once, not simply its overall score in the game. One of these criterion had to do with how much it could make its visual environment change by performing various actions. “The emphasis is on learning how your actions affect what you will see,” Jaderberg and Mnih said.

They said this was also similar to the way newborn babies learn to control their environment to gain rewards - like increased exposure to visual stimuli, such as a shiny or colorful object, they find pleasurable or interesting.

Jaderberg and Mnih, who are among seven scientists who worked on the paper, said it was “too early to talk about real-world applications” of Unreal or similar systems. Bloomberg