Original Source Here
July 2021: ML News and Code
The global race to even bigger Language Models starring Mixtures of Experts, distributed learning from Yandex and Huggingface, SpeechBrain and more. And will OpenAI powered GitHub Copilot change computer programming? Here’s our monthly selection of recent ML news, research and code gaining traction:
We’re halfway 2021, and the ML-sphere keeps spinning: the Conference on Computer Vision and Pattern Recognition (CVPR 2021) was just held, Github and OpenAI released Copilot, an unprecedentedly intelligent code completion assistant, and much more happened in the last few weeks. Zeta Alpha is happy to help you discover the latest AI research and software and keep you up-to-date. Enjoy!
🗞 Some News
The trend of outrageously large models is nowhere near an end. One year ago the release of OpenAI’s GPT-3 got the AI community flabbergasted with 175 Billion parameters. This month was the turn of Wu Dao 2.0 to break the record, showing how China’s not dragging behind at all when it comes to pouring resources in AI research. Wu Dao is a multimodal (text and images) massive model with 1.75 Trillion parameters, based on a Mixture of Experts architecture (more on that later!). While the official press release only touched the surface of the model and not much is public about it, the paper outlining the system for training the model: FastMoE: A Fast Mixture-of-Expert Training System is on arXiv and the code open sourced on GitHub. Wish OpenAI would do more of that.
While Wu Dao is not open to the public, GPT-J is: the best zero-shot performing, performing publicly available GPT Transformer to date (at 6B parameters), recently released by Ben Wang and Aran Komatsuzaki. Built with JAX, yet another boost to the library, which has been slowly but steadily gaining popularity in the last 2 years.
Finally, Github Copilot just released a few days ago: a plugin that brings next generation code synthesis, based on Codex, a GPT-like model from OpenAI trained on a massive dataset of public Github code. But the announcement leads to a dashing landing page with cherry picked examples, and a public demo is still not available. Many questions are still in the air: how big and how fast can this model do inference? What are the details of the training dataset used? Should we be concerned about copyright protected data being accidentally surfaced by the model as it has been shown previously⁵? This twitter thread sheds some light on the topic, and we’re impatient to try it ourselves… It has the potential to make programming 10x more productive, and to democratize writing code, but then it has to work really, really well. And we know that bugfree code does not exist. Would it be easier than bringing self-driving cars on the road?
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot