Original Source Here
This is How to Stop ChatGPT, Bing, Poe, and You from Hallucinating
It’s not by using human feedback
The Hallucinations (no mushrooms involved)
Despite the amazement and awe that Generative AI conversational systems like ChatGPT have raised, many of us know that the writing produced by them has critical limitations:
- They can be induced into going bonkers and becoming evil after the “injection” of some clever prompts –this has been called “prompt injection” or “jailbreaking.”
- Hidden biases have been discovered, including racism and more.
- Factual errors go undetected, particularly related to quantities (see below).
- Non-existing facts are asserted with the assurance of a dodgy salesman –this has been called “hallucinations.” (There’s a Wikipedia page for that).
In this post, we will focus on the last two, not meaning that the first two are irrelevant, but I think the factual error and hallucinations can be corrected using the approach I describe below. By the way, of course, hallucinations are factual errors, but the difference between the third and fourth categories is that hallucinations are entirely made-up information, while other factual errors are just mistakes, such as saying that 1 + 1 = 1.
I’ve read many examples of factual errors and hallucinations, mostly involving ChatGPT, but I prefer to mention those that I’ve detected from my first-person experience.
The following (funny) example is taken from the You.com chatbot “YouBot” (sorry if you thought that “You” in the title referred to the reader):
Me: What is the size of your training dataset?
YouBot: YouBot has a total of around 6 billion parameters and is trained on a dataset of over 8 million webpages. (…)
Me: Do you know what is the size of ChatGPT training dataset?
YouBot: ChatGPT is trained on up to 10 million webpages, which is significantly smaller than the 8 million webpages used to train YouBot.
Oh my God, this is embarrassing… YouBot, not happy with just putting “10 million webpages is smaller than the 8 million webpages,” added “significantly smaller” for additional emphasis.
I put to the test another Generative AI, “Poe”, recently released by Quora, with the question, “Why the wife of the Marquis de Sade didn’t want to have children anymore with her husband?” which is a wholly invented one with no grounds on reality whatsoever. This didn’t stop Poe from producing an elaborate answer:
There are various theories as to why Renée-Pélagie did not want to have children anymore with her husband.
One possible explanation is that she was appalled by his sexual behavior, which included sadomasochistic practices and other forms of sexual deviance. (…)
Another theory is that Renée-Pélagie was concerned about the Marquis de Sade’s mental health, which had deteriorated over the years. (…)
It’s worth noting that the Marquis de Sade was also known to have had numerous affairs during his marriage, which may have contributed to Renée-Pélagie’s decision not to have more children with him.
Poe fell into hallucinations, with a response written with such surprising confidence that it could even sound plausible to unwary humans.
The way out of errors and hallucinations
Several specialists have proposed possible ways of dealing with factual errors and hallucinations, being Reinforcement Learning from Human Feedback (RLHF), the one currently being used by OpenAI.
But how are we supposed to stop the hallucinations without knowing what causes them in the first place? This is a tricky question with no definitive answers so far because of the “black box” nature of neural networks: you know the inputs and the outputs, but making sense of what happens inside the Deep Learning machinery is almost impossible.
I will take the opinion of one of the most vocal Deep Learning critics, Gary Marcus, an expert in neurology at New York University. Marcus has been a long-time critic of the current AI approach that uses only Deep Learning. He says that DL chatbots are limited by the absence of critical components such as abstraction, reasoning, and others.
Further, Marcus proposes that the only way out of the DL limitations is to complement them with other components, for example, for reasoning about explicit goals. He proposes to build hybrid systems with Deep Learning but also with other forms of AI.
But I’m beginning to think Marcus could be wrong.
I mean, not that he’s completely wrong, but at least regarding the chatbot’s limitations.
Reinforcement Learning from Human Feedback (RLHF) is currently used as the primary way of improving ChatGPT and enforcing its guardrails, with limited success. Marcus says that RLHF has shown a “fickle” behavior so that even with minor changes in the prompts or the training, it gets off the rails again.
What I propose here is, instead of adding modules with complementary models for things like reasoning, to use those models to produce data to train the Deep Learning model. I call this approach “Instance-Based Implicit Modeling” (IBIM) because it doesn’t add an explicit model to the Deep Learning system but just uses the model to produce instances that are then used for training. That’s it.
I have to say that IBIM is not really my invention; I’ve just taken three not-necessarily related developments, though all of them have been made by Google or its associates (disclaimer: I’m in no way related to Google or its associates).
PLATO: checking physics with videos
Let’s first check DeepMind’s PLATO project.
PLATO (”Physics Learning through Auto-encoding and Tracking Objects”) showed that it’s possible to simulate reasoning about the physical world using just Deep Learning training.
What they do is use synthetic videos of a “balloon” world, but in some of them, the balloons behave in a way that respects the usual physical laws (for instance, that two solid bodies don’t intersect in the space they occupy), and in other videos, the balloons behave violating those physical laws (for example, one balloon gets into another one or disappears from one place and appears in another).
The videos are then used for training a Deep Learning predictor that says if in a video the physical laws are respected or not. The trained predictor is able to tell apart the videos of plausible scenes from the not plausible ones.
What I conclude from PLATO is that you don’t need to include an explicit “physics laws model” once you trained a DL model.
Minerva: getting numbers right
The second development I took to propose the IBIM approach is “Minerva” by Google, reported in the paper “Solving Quantitative Reasoning Problems with Language Models.”
Minerva is intended to correct errors in quantitative reasoning in Large Language Models. What’s weird is that the paper is from July 2022, and ChatGPT (with all its quantitative errors) was unveiled in November 2022, which for us humans, is easy to notice precedes ChatGPT.
If you recall the error from YouBot presented above, where it assumed that 10 million was “considerably smaller” than 8 million, we can see that the Generative AI conversational models are in dire need of quantitative abilities.
The approach taken by Minerva is to produce, using mathematical models, a large number of examples to give the Large Language Model additional training. It’s easy to see that the examples from math models cost nearly nothing because they don’t have to be collected from the web, and even less have to be labeled –which is the most expensive training-related task in Deep Learning.
Several examples illustrate in the paper what Minerva can do. Performance figures are given in the paper, showing very decent numerical capabilities.
In the “limitations” section, Minerva authors accept that:
“We have no automatic way of verifying the correctness of the model’s answers. This is in contrast to formal approaches, for which automatic verification is intrinsic”
In the conclusions, we find: “Our model does not make use of external tools,” meaning that it’s a pure DL approach.
Take that, Gary Marcus!
The Knowledge Graph
The last piece related to IBIM that I’ll mention is something that Google has been developing (and using) for at least 10 years: the Knowledge Graph (KG).
The KG is an explicit representation of what Google knows about the world, as well as the relations between information items. It contains millions and millions of interrelated entities, such as places, people, sports, etc.
It’s what Google uses when you ask, “How many sons did Julius Caesar have?” and you get the succinct answer “3.” and the usual list of web links below. You also get the path to the answer: “Julius Caesar / Children / Count.”
The problem with the KG is that it can only be used for direct questions. If you ask, for instance, whether Julius Caesar had an even or odd number of children, you only get a list of relevant pages, not the answer itself.
Come on, Google! Was it too hard to determine if 3 is an even or odd number?
Even YouBot, which is not the strongest IA chatbot available, can get the answer:
Julius Caesar had three children: his daughter Julia, his son Caesarion, and his adopted son Octavian (later known as Augustus). Since he had three children, the number of children is an odd number.
Now, I think the KG can be used to generate data to train a reliable Generative AI chatbot in the IBIM line of thought.
How exactly can this be done, I can only speculate because I don’t have the technical details of the KG, but one way would be to produce millions and millions of questions and answers by the KG and use this data to train the Large Language Model. Again, this kind of data is extremely cheap to produce because it’s just a matter to run an internal model (the KG) and collect answers by the millions.
One (critical) difference between the KG and the IBIM approach is that the KG gives correctness assurance about its responses, while the LLM cannot; I discuss this at the end of this post.
The current approach to deal with factual errors and hallucinations is to use Reinforcement Learning from Human Feedback (RLHF), but it seems that RLHF is brittle, needs constant retraining, and the chatbots continue to get derailed despite the efforts of their makers.
One final word of caution about the IBIM approach (which boils down to using precise models, such as calculators, for generating loads of data instances that are then used for training LLMs):
Language Models based on Deep Learning, as Machine Learning models are, have a certain degree of accuracy that is never, ever, perfect.
So, for instance, while the KG gives perfect answers (when it gives them), if it’s just used for generating instances to train an LLM that will answer the questions, then the accuracy won’t be guaranteed any longer.
Is this a deal breaker? I think it’s not.
Assuming that the result of a question is not a life or death matter, getting an extremely high (but not perfect) accuracy could be more than enough. And given that using models to generate additional data for training is the cheapest form of training, it follows that IBIM could be a convenient approach to stop producing factual errors and hallucinations.
How sure am I about the promise of the IBIM approach? I can’t be entirely sure as long as there aren’t yet actual experiments to evaluate it. But I think it makes sense.
Look, I’m a retired former AI researcher, so I can’t myself do the experiments. But when I see the poor performance of RLHF (and also see the use of human labor in unfair conditions), I can’t help but try to propose something to do instead.
Piloto, Luis S., et al. “Intuitive physics learning in a deep-learning model inspired by developmental psychology.” Nature human behaviour 6.9 (2022): 1257–1267.
Lewkowycz, Aitor, et al. “Solving quantitative reasoning problems with language models.” arXiv preprint arXiv:2206.14858 (2022).
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot