Open-ended learning at DeepMind

Original Source Here

On the face of it, there’s no obvious limit to the reinforcement learning paradigm: you put an agent in an environment and reward it for taking good actions until it masters a task.

And by last year, RL had achieved some amazing things, including mastering Go, various Atari games, Starcraft II and so on. But the holy grail of AI isn’t to master specific games, but rather to generalize — to make agents that can perform well on new games that they haven’t been trained on before.

Fast forward to July of this year though and a team of DeepMind published a paper called “Open-Ended Learning Leads to Generally Capable Agents”, which takes a big step in the direction of general RL agents. Joining me for this episode of the podcast is one of the co-authors of that paper, Max Jaderberg. Max came into the Google ecosystem in 2014 when they acquired his computer vision company, and more recently, he started DeepMind’s open-ended learning team, which is focused on pushing machine learning further into the territory of cross-task generalization ability. I spoke to Max about open-ended learning, the path ahead for generalization and the future of AI.

Here were some of my favourite take-homes from the conversation:

  • One of the most important advances reported in the recent DeepMind paper on open-ended learning was the procedural generation of games for agents to play. By generating a vast number of different environments and game objectives automatically, agents could be trained on a hugely diverse range of tasks, forcing them to develop meta-learning techniques, such as exploration and trial-and-error.
  • The games they generated included individual, cooperative and competitive scenarios. Interestingly, agents seemed to learn competitive behvaiours more easily (and faster than) cooperative behaviours. Max thinks this is because competition came with a more gradual learning curve: agents were trained to compete with other agents who had experienced roughly the same amount of training as they had at every stage of the process. As their skills improved, so did those of their in-game opponents, so they were always ready for the “next lesson” in competitive dynamics.
  • There are a lot of interesting parallels between recent progress in open-ended reinforcement learning, and advances in scaled language models. In both cases, researchers are interested in identifying proxies for generalization ability that can be measured reliably. In the case of OpenAI’s GPT-3 for example, one of those proxies has been the algorithm’s ability to do arithmetic: GPT-3 does quite well at adding small numbers, but fails at adding larger ones — which some have argued shows that GPT-3 hasn’t “learned to understand” addition as a concept. The analogue to addition in RL might just be game theory: how well and how consistently does an RL agent seem to apply principles of game theory to the problems and environments it’s exposed to?
  • That said, Max also observes that the fact that models like GPT-3 and open-ended RL learners do show an imperfect grasp of arithmetic and game theory could simply reflect the fact that they tend to learn heuristics rather than rigorous symbolic logic. And that could be a good thing: heuristics are more robust than logic, since they can evolve as the problem domain they’re applied to gets more complex. Heuristics are also the approach humans use to learn things, which is why we display many of the same failure modes as AI models: most humans struggle to do arithmetic with large numbers, but we don’t take that to be an indication that they can’t (or don’t) understand math. Perhaps we should think the same way about AI.

You can follow Max on Twitter here.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: