Domain Adaptation

https://miro.medium.com/max/1200/0*HYj8i9adg_7Rv5sz

Original Source Here

Recently I came across few papers in domain adaptation and I had no idea what this means or what it does. After reading these papers I had so many questions popping my head like how is DA different from Transfer learning, what are different techniques to do DA, etc. In this article, I will try to answer a few of these questions. So let’s start with what the heck is Domain Adaptation??

DA, what and why?

We don’t have a scarcity of data in today’s world, but what we lack is labeled data and consistent data. This means that the distribution on which we train our ML or DL models might not be the same as the distribution we are testing it on (which can be because the way the data is collected changes, or the source data changes or we did not consider enough sample distribution during train). For example, we train our models on MRI images but we might want to test it on a CT-scan dataset.

Aim: the objectic of Domain Adaptation technique is to make our models robust to changes in data distribution, or data shifts.

But isn’t that what Transfer learning does? We’ll answer that question in a while but first, let’s define few notations which we will be using here:

  1. Domain: a domain is defined as D = {X, P(X)} where X is the feature space (e.g., the text representations, image embeddings), and P(X) is the marginal probability distribution of X over that feature space. In other words, a domain can be said to be the distribution from which we collect our data[1]
  2. single domain: the training set and test set are from the same distributions
  3. multiple domains: the training set and test set are from two different distributions
  4. : source domain,
  5. Dᵗ: Target domain
  6. Task: A task T is defined as the {Y, P(Y |X)}, where Y is the label space and the objective is to learn P(Y |X). So classification and object detection are two different tasks where we have two different objectives to solve.
  7. : source task
  8. Tᵗ: target task
  9. Source: source is the original task that the model was trained on(refer to fig1.).
  10. Target: target is the task that we need to finetune on(refer to fig1.).
  11. : source label space
  12. Yᵗ: target label space
fig1. [2]

Now let’s answer the above question!

Zoom out view

To understand the difference between Transfer Learning and Domain Adaptation let’s zoom out a bit.

fig2. Pan, Sinno Jialin, and Qiang Yang. “A survey on transfer learning.” IEEE Transactions on knowledge and data engineering 22.10 (2009): 1345–1359.

The objective of transfer learning is to train a task Tˢ on domain Dˢ and then use the acquired knowledge to learn a new task Tᵗ belonging to a domain Dᵗ where Dᵗ Dˢ. So for example we use the BERT(which is trained with a huge amount of data on MLM and NSP tasks) model and finetune it on a smaller dataset.

The above figure makes it clear that DA is nothing but a sub-branch of Transfer learning. Transfer learning is a more general term, but domain adaptation is a specific case where = Yᵗ, i.e the label space of the source matches that of the target but the data come from different distributions(Xˢ ≠X).

To get a better understanding of transfer learning and different strategies, problems related to it refer to[3].

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: