Multi-task Learning: All You Need to Know(Part-1)

Original Source Here

Multi-task Learning: All You Need to Know(Part-1)

Figure: Framework of Multi-task learning

Multi-task learning is becoming incredibly popular. This article provides an overview of the current state of multi-task learning. It discusses the extensive multi-task learning literature, in particular, to provide context for current neural network-based methods.

I’ll try to cover the following multi-task learning themes in part one.

  • Introduction
  • Motivation
  • MTL methods for Deep Learning
  • Why does MTL work?


When it comes to machine learning (ML), we usually focus on optimizing for a specific metric, such as a performance on a particular benchmark. We typically train a single model or an ensemble of models to carry out our intended task in order to achieve this. We then tweak and improve these models until there is no longer an increase in performance. While doing things this way often results in satisfactory performance, we miss out on information that could have improved our performance on the measure we care about because we are so intently focused on our one task. This data is derived explicitly from training signals of associated tasks. We can help our model generalize more effectively on our initial job by sharing representations across similar tasks. This approach is known as Multi-task Learning (MTL).


We may motivate multitasking learning in a variety of ways: Multitask learning can be viewed biologically as having been influenced by human learning. We frequently use the knowledge we have gained through learning related tasks while learning new tasks. A baby might first learn to recognize faces before using that skill to identify other items.

When we attempt to learn something new, our previously acquired knowledge helps us a lot to learn things faster. As a result, it should come as no surprise that neural networks need a lot of training samples and processing time. Without first being able to walk on solid ground, it would be challenging to navigate a tightrope.

Methods of MTL for Deep Learning

We will now examine the two methods that deep neural networks most frequently employ to perform multi-task learning. Methods are as follows:

  • Hard Parameter Sharing
  • Soft Parameter Sharing

Hard Parameter Sharing

Figure: Hard Parameter sharing for Multi-task Learning

The MTL method that uses hard parameter sharing is the most popular in neural networks. Typically, it is implemented by keeping a number of output layers that are task-specific while sharing the hidden layers across all tasks. Overfitting is greatly diminished by hard parameter sharing. Overfitting is when the model takes in and learns concepts from the noise or random oscillations in the training data.

This is because the more tasks we are learning simultaneously, the more our model has to find a representation that captures all of the tasks, and the less our chance of overfitting our original task.

Soft Parameter Sharing

Figure: Soft Parameter Sharing for Multi-task Learning

In soft parameter sharing, every task has a unique model with unique parameters. The model’s parameters are then encouraged to be close by regularizing the distance between them. By loosely connecting the shared space representations, this method, in contrast to rigid sharing, allows activities greater flexibility.

Why does MTL work?

To know why MTL works, we need to first clear our concepts of MTL mechanisms. MTL mechanisms are as follows:

  • Implicit data augmentation
  • Eavesdropping
  • Attention focusing
  • Representation bias
  • Regularization

Implicit data augmentation

MTL effectively increases the sample size that we are using for training our model. We want to build a suitable representation for a task that avoids data-dependent noise. It should generalize effectively when you want to train a model on a certain task because all tasks are at least slightly noisy. A model that can learn two tasks at the same time can learn a broad representation of the noise patterns associated with each task. The model should learn both tasks simultaneously to prevent overfitting and allow the noise patterns to be averaged into a more accurate representation of the data.


Some features may be simple for one task to learn while being challenging for another task. Learning may be challenging because the task interacts with the features in a much more complicated way. Thanks to multi-task learning which allow the task that is having trouble picking up the new feature to listen in on the task that is learning it. This will enable the task that is challenging to learn to pick up the new feature. Directly teaching the model to anticipate the most crucial traits via hints is the most direct method of achieving this goal.

Attention focusing

It may be challenging for a model to distinguish between relevant and irrelevant features if the task is extremely noisy or the data is sparse and high-dimensional. Due to the fact that other tasks will provide additional evidence for the relevance or irrelevance of those features, MTL can aid the model in concentrating on the features that really matter.

Representation bias

MTL biases the model to prefer representations that are also preferred by other tasks. This will also aid the model’s future generalization to new tasks, as a hypothesis space that performs well for a sufficiently large number of training tasks will also perform well for learning novel tasks as long as they are from the same environment.


By introducing an inductive bias, MTL acts as a regularizer. As a result, it reduces the risk of overfitting as well as the model’s Rademacher complexity, or ability to fit random noise.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: