Reward is Enough — ML Paper Review*hFe0lMNlmGKZNNIA

Original Source Here

Reward is Enough — ML Paper Review

Maximizing reward is the secret sauce to artificial intelligence

Photo by Robert Anasch on Unsplash

From the authors of “Attention is all you need”, this paper proposes an intriguing hypothesis that incentivizing AI agents with reward is enough to achieve General Artificial Intelligence. The paper is more of a philosophical paper rather than one with a machine learning model and code. I guess this gives us an indication of why Deep Mind has been pouring all of its effort and money into optimizing games with AI agents, they believe that developing the strongest reward-seeking agents is the key to Artificial Intelligence. In this article, we are going to understand why they believe so.

Developing skills

Each ability arises from the pursuit of a goal that is designed specifically to elicit that ability.

Source: Reward is Enough Paper

The funny thing about this paper is that it can be read and understood by non-technical people/programmers. Their first assumption is that developing a skill usually arises from chasing a certain endpoint or a target that requires mastering this skill. Think about this for a moment and see whether you agree or disagree. A good example of this is AlphaZero, Deepmind’s AI agent that mastered the Chinese game Go. The AI agent wasn’t designed with certain skills in mind. I don’t even think that the AI developers actually understood or were good at the skills needed to play the game of Go. However, they were good at putting the reward (and the environment) into code which resulted in the AI agent developing certain skills that even they weren’t expecting such as discovering new opening sequences and using new surprising shapes [1].

The main hypothesis

Intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment.

Source: Reward is Enough Paper

The core of the paper is to figure out when AI agents (or people actually) develop skills. Their main hypothesis is that these skills/abilities arise when agents/people start seeking rewards in a certain environment. Essentially this implies that we don’t even need to teach an AI agent the skills needed to thrive in an environment, we just need to model the reward as best as possible and it will start learning. A similar analogy in supervised image learning would be to optimize the supervised goal instead of thinking about how a network is going to reach this goal.

I do agree in some sense with this hypothesis, however, I do have a few points to make. First of all, this hypothesis makes it seem as reward is much more significant than modeling the environment, which I don’t think is true. If you have a perfectly modeled reward and a poor environment, your AI agent is likely to underperform. Also, although it might sound theoretically valid, implementing rewards is quite difficult due to various reasons such as quantifying objectives. For instance, how would you quantify feelings such as happiness, satisfaction, or fulfillment which are highly likely to be rewards.

General (Artificial) Intelligence

General intelligence, of the sort possessed by humans and perhaps also other animals, may be defined as the ability to flexibly achieve a variety of goals in different contexts. According to our hypothesis, general intelligence can instead be understood as, and implemented by, maximising a singular reward in a single, complex environment.

Source: Reward is Enough Paper

It seems to me that they have kind of changed the definition of general intelligence to better suit the paper. At least that’s the feeling that I got from reading this. They propose that giving someone a goal or a reward is enough (given the presence of a complex environment) to motivate them to learn skills that make them “intelligent”. I think this might be true in some cases but not generally true. Let me know down in the comments what you think.

Is unsupervised/supervised learning enough?

Compared to reinforcement learning, unsupervised learning provides a mechanism for agents to identify patterns and make a prediction, but it doesn’t provide a clear pathway to developing abilities and skills needed for making choices, which in turn makes it not enough for general artificial intelligence. However, it can be quite useful to supplement reinforcement learning as seen in a lot of SOTA reinforcement learning papers.

Supervised learning seems more suitable for general artificial intelligence, you give the algorithm a goal and it works towards it. However, the dataset that you give this algorithm will almost never be enough to develop General Artificial Intelligence. It’s always going to be limited in some way, the distribution is going to be different than the real-world distribution. However, this doesn’t negate the fact that supervised learning can be quite useful in a lot of scenarios.

One final important point to note here is that they point out that “Offline learning is unlikely to be enough”. Sure there are some scenarios where the dataset provided is enough for the agent to solve the underlying problem. However, in the majority of real-world problems, online learning will be needed since the problem & the dataset are likely to keep shifting. This puts more light on the importance of software engineering in machine learning simply because online systems mainly rely on the efficiency of storing & retrieving the data and simply building a sufficient overall system. This requires extensive knowledge in building APIs, high-quality databases, and pipelines.

Final Thoughts

This is quite a controversial paper. I think I didn’t agree with the underlying hypothesis, but I found it quite thought-provoking and that’s why I thought it would be useful to write an article about it. The answers to the questions being made here are quite significant and it promotes a healthy discussion.

If you want to receive regular paper reviews about the latest papers in AI & Machine learning, add your email here & Subscribe!


[1] Reward is Enough Paper


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: