Original Source Here
3 papers for improving BERT’s performance
Written by Davit Soselia & Shota Amashukeli — July 09, 2021
While BERT has become widely adopted by the community, there are still a lot of issues that can hamper the fine-tuning process and keep researchers and engineers from achieving the desired performance. In this post, we want to highlight a few papers we saw at ICLR 2021 that can help with some of the common hurdles.
Stabilizing BERT’s fine-tuning process
Fine-tuning BERT and other language models have become a common practice for achieving various tasks both in academia and industry, enabling previously unaccessible performance. However, finetuning remains fairly unstable, with initialization affecting the performance significantly, resulting in many hours lost in extra training.
The paper by Marius Mosbach, Maksym Andriushchenko, and Dietrich Klakow at ICLR 2021 identified one of the core causes of the process being optimization difficulties that lead to vanishing gradients. It proposed a solution that worked well to remedy the problem for BERT, RoBERTa, and ALBERT datasets.
The paper is accompanied by a well-documented GitHub repo simplifying integrating the proposed solution in existing projects.
When you only have a few samples for BERT fine-tuning
While fine-tuning BERT can be an excellent way to get great performance even on otherwise smaller datasets, sometimes there are just too few samples to get the desired performance.
The paper Revisiting Few-sample BERT Fine-tuning addresses this issue. The authors explore the effects of re-initialization of certain layers, a number of those layers, the use of BERTAdam and debiased ADAM, and various other parts of the fine-tuning process.
Results show that re-initialization of top layers can speed up learning and improve performance over simply fine-tuning. They also identify implementation of BERTAdam without debiasing as a major issue.
the GitHub repo for the paper has well-defined steps for using the project.
Improving BERT’s Robustness to attacks
Textual adversarial attacks, while less common than their image counterparts still pose a critical issue. InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective proposes a robust fine-tuning framework to alleviate this problem.
InfoBERT works by introducing information based regularizers to the training process:
- Information Bottleneck regularizer — aims to minimize noisy mutual information between models inputs and their feature representations.
- Robust Feature regularizer — helps increase mutual information between local robust features and global features.
The paper reports very promising results with ANLI and Adversarial SQuAD — dataset containing attacks on BERT and RoBERTa.
The code provided by the researchers can be found in this repo.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot