Original Source Here
Till April 2020: GPT-2 was the king of AI, with his stunning 1.5B parameters.
It is not easy to deal with it. It takes 6GB on your disk, but that’s not the problem. The problem is processing speed: you have to wait several minutes for a single inference running on the CPU. With GPU, it would be at least ten times faster, in a case when you have NVidia GPU with at least 24 GB of Video RAM.
Somehow, you started to wish to fine-tune its behavior. Not that hard with the most miniature version, with its only 124 M parameters. Having NVidia GPU with 8 GB of Video RAM, you are capable of doing this in a reasonable amount of time measured in hours and experienced this with fine-tuning it with Shakespeare’s writings, Christmas songs, etc. All you need to know is how to prepare input to train a model that would generate what you would like to see, and that comes with experience, but it’s not rocket science.
If you want to fine-tune the biggest and the best GPT-2 … well, in that case, you will do the same job, but with up to 2–3 weeks. The slowdown comes because you have to do all the processing on the CPU. Worth waiting if previously practiced with the smallest GPT-2 running on GPU.
In the meantime, there are a lot of cloud offerings for AI services. Amazon, Microsoft, and Google have it, and other smaller players have it as well. The benefit of using cloud-hosted AI services:
- You don’t have to take care of the infrastructure (ok, this is obvious)
- You don’t have to know all the details of what you are doing. Nice, catchy, and handy UI allows you to declare what you would like to do, and voila, you have it.
But, the most important part is here is that with enough human power, you can do everything on your local computer. Ok, not everything, if you take the language-translation cloud engines into account. Google and Micorosft still excel here with the uncomparable quality of whatever you can do on your local machine. And guess what: there are cloud-based services that make even Google and Microsoft beginners in the language translation game. DeepL Translate: The world’s most accurate translator, for example. I tried it, was, and still, I’m fascinated.
April 2020 came with an exciting and quite astonishing surprise to everybody that experienced it on their own: GPT-3. With its 175B parameters, it excels over everything we’ve seen before. Few-shot learner, tell him what you would like him to do, and he would do it. Not always perfect, but good enough to leave everybody speechless. Then came some other models, then the Chinese Wu Dao 2.0 appeared. China’s Answer To GPT-3. Only Better :-).
Everybody started to look at those models with “ok, this is great”, “way to go in the AGI direction”, and other (miss)information. Those gigantic language models don’t understand what they are talking about. All they do is simply producing the most probable text as an extension to the given one, based on the text corpora they were trained with. Nothing intelligent, simply very computationally expensive statistics.
But that’s not the point of this writing. The point of this writing is hidden in two facts that stand behind the scenes, obvious to all, but hight undervalued:
- The models are not open. We don’t know the architecture that makes the neural network implementing them so amazing.
- The models are that big that only a few companies can afford to host and use them.
- How many companies can play with similar tools and try to be a competition?
OpenAI, the inventor of GPT-2 and GPT-3, connected with Microsoft and exclusively allowed it to him to use it. Now, both of them are looking for ways to make money on such a gigantic language model. They exposed its functionality via web service, which is not bad since it is the only possible way to offer it to somebody else. The key phrase here is “the only possible way”. Simply because you can’t afford to host it on your own, even in a case you can access all the files you need.
So, if you even figured out how to use it for some business case, you can access only the instance you can reach but host by some other big company, with the caliber of Microsoft or Google, for example. And such companies are not too much in the world.
And, things will be even worse in the foreseeable future from that point of view. Forget hosting some AI model smart enough to excel. The future AGI-routed AI models will be much bigger than that, that’s for sure.
As a consequence of the written above: you want to use AI? Fine, just choose a provider and use its services. That is the same as when you need a calculator for some multiplication of 3 digit numbers, and you have to ask some big machine you are connected to. You need it from Excel? Then, the Excell will be the one that will automatically ask that machine.
That goes in the direction of using electricity: everybody needs it, and everybody uses it. Still, nobody produces it in in-house premises: they plug the devices into the wire and are supplied.
Consequence: we’ll depend on AI provided by some others. Now, can we trust it? Will we know how the models are trained?
Consequence: AI that we’ll use in the future should be regulated by the governments. In the same way, electricity is regulated now: it is clear which properties electricity has to be usable and trusted by the others. And here, by others, I mean everybody that needs “common-sense capable AI”. And the “common-sense AI” is the AI of the future.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot