Original Source Here
Closing the Gap between Machine Learning and Human Learning
Advances in Large Language Modeling
Humans possess a powerful ability to reason. They understand a question asked by a fellow human-being and provide the most appropriate answer to it. A human brain can do quick mathematics to answer a trivial question like “If I have 10 balls and bought two cans, each having 5 balls, how many balls would I have?” The humans can do commonsense reasoning like “If a driver sees a pedestrian on the crossover, what would he do?” Humans have intelligence in understanding if somebody is cutting a joke and probably get a deeper understanding of what the speaker really wants to say?
The question is, can we train the machines to gain this kind of intelligence that we humans possess? Scientists have conducted a lot of research in this area in recent years. With the invention of Deep Neural Networks (DNN) and the Large Language model (LLM), we are making good progress in achieving this goal. In this article, I will introduce you to the achievements made through the use of LLM and Google’s latest PaLM[¹] (Pathways Language Model).
First, let us consider the tasks that we are trying to achieve.
After the great success of large neural networks in natural language processing (NLP), researchers focussed on language understanding (NLU) and generation rather than the simple task of text processing. So, what problems are we trying to solve using these enormous networks? I give below a short list of NLU tasks we seek to solve. Though the list below is not exhaustive, it will give you some idea about our goals.
- Language translation
- Chatbots (Question-Answering)
- Text summarization
- Language Generation
Translating from English to German or vice-a-versa, rather, translating from any-to-any language, has always been our need. Today, there are several ML models and even mobile apps that use such pre-trained models to achieve this task with a very high accuracy.
Replacing a vast fleet of Customer Service Representatives (CSRs) with automated systems has always been a dream for businesses. This is now achieved with near-perfect chatbots. Chatbots require natural language understanding and question-answering (Q&A) ability. Though the question-answering ability in a specific domain has been mostly perfected, it is still a challenge to develop a Q&A system for open-domain. Humans quickly understand the question’s context (domain) to answer the question. This requires what we know as few-shot learning[²] for LLMs. GPT-3[³] was the first to apply few-shot learning. The more recent LLMs like GLaM[⁴], LaMDA[⁵], Gopher[⁶] and Megatron-Turing NLG[⁷] all employ few-shot learning.
Several times, we need to create a summary of a long document. Though this is an NLP task, the language understanding also plays an effective role in creating a meaningful summary. The encoder-decoder architecture with Attention and Transformer-based architectures has shown outstanding success in creating both abstractive and extractive summarization.
Writing like Shakespeare is a dream for many. The neural network architectures starting with RNN (Recurrent Neural Networks), LSTM (Long Short Term Memory) and latest Transformer-based allow creation of novels that can mimic Agatha Christie’s and many famous writer’s writing. There are many tools available like Arria NLG PLC[⁸], AX Semantics[⁹], Yseop[¹⁰], Wordsmith[¹¹], SimpleNLG[¹²], NaturalOWL[¹³] and others for Natural Language Generation (NLG)[¹⁴].
Humans have great common-sense reasoning abilities, conceptual understanding, abilities to play trivia, synonym games, and give responses based on the counterfactuals.
These are just the few areas where NLP/NLU research is strongly progressing. The above goals can be achieved by creating large language models.
The major challenge in creating LLM is training an extremely large deep neural network with millions and billions of parameters. The models like GLaM and LaMDA were trained on a single TPU v3 Pod. Megatron-Turing NLG used pipeline parallelism to scale to 2240 – A100 GPUs across GPU clusters. Gopher using multiple TPU v3 Pods achieved a scale of 4096 TPU v3 chips. They observed that larger models with many more training parameters improved the NLG results. PaLM is the latest one in this category that uses Google’s Pathways system to scale training to 6144 chips and create a 540-Billion parameter language model. It achieved a training efficiency of 57.8% hardware FLOPs utilization, which is the highest yet achieved for LLMs. Google reformulated Transformer block to allow for Attention and Feedforward layers to be computed in parallel. This helped in creating a better parallelism strategy for training the network.
They trained PaLM on a combination of English and multilingual datasets. These include Wikipedia articles, books, web documents, conversations and even GitHub code. Note PaLM can also write computer code, that is probably because of its training on GitHub. In code generation, white spaces are important. Thus, the trainers created a “lossless” vocabulary that preserves white spaces. They also took care of out-of-vocabulary Unicode characters and splitting large numbers into individual digits. All these helped in effective code generation.
With this, I will now discuss some of the PaLM’s achievements.
Researchers evaluated PaLM on 29 widely used NLP tasks. It surpassed the few-shot performance of prior language models on 28 out of these 29 tasks. It also showed powerful performance on multilingual NLP benchmarks, including translation. BIG[¹⁵] (Beyond the Imitation Game) is the latest benchmark that comprises over 150 new language modeling tasks. PaLM showed a better performance on a subset of 58/150 tasks compared to Gopher, Chinchilla, and Humans. It effectively distinguishes between the cause and its effect. For a specified context, it will provide an appropriate answer to your question. It can play the Synonym Game. It can deduce counterfactuals from a passage.
In the question that I have presented earlier — “If I have 10 balls and bought two cans, each having 5 balls, how many balls would I have?” — The mere answer may not easily convince the reader with its accuracy. You need to give out the multi-step arithmetic and reason out how the conclusive answer is deduced. Using chain-of-thought prompting, PaLM will reason out the answer by generating text for all intermediate steps. The Google blog provides a few examples to show these abilities.
PaLM can generate an explicit explanation, even for a joke. Generating such explanations requires multi-step logical inference, world knowledge, and deep language understanding. There is an excellent example provided in the blog[¹] to show this.
Besides natural language text generation, the code generation is another important task that we expected LLMs to perform. A code generation can mean text-to-code, translating from one programming language to another and even fixing the compilation error (code-to-code). PaLM training datasets included coding examples, albeit only 5%. It has shown comparable performance to fine-tuned models such as Codex 12B[¹⁶]. I will again refer you to the Google blog[¹] for excellent examples.
After looking at the latest developments on LLMs, especially PaLM, one can observe that the gap between a machine, learning the natural language and we humans is closing fast. Recently, Meta AI too has made available its OPT-175[¹⁷] billion parameter model to the research community. We can hope to see this gap between machine learning and human abilities close quickly soon.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot