Galactica: What Dangerous AI Looks Like

Original Source Here

Galactica as seen by the authors and supporters

Galactica was built as the first step to fulfill an old promise: that computers could solve “information overload in science.”

As science advances, humanity collectively knows and stores increasingly large amounts of knowledge about the world. Yet, as individuals, the total percentage we can absorb rapidly decreases over time.

AI could be the solution to that. Meta conceived Galactica to “organize science” in an attempt to finish the task search engines failed to accomplish. If the model worked as intended, it’d be trivial to do so — Galactica would be “a big deal.”

However, as resourceful as Meta is and as powerful as LLMs are, solving this problem is still too ambitious a quest. Galactica falls short in many ways.

But before going into its deficiencies, let’s give Meta the benefit of the doubt and objectively analyze what Galactica can do (here’s the paper if you want to read it).

Galactica is a family of language models (125M to 120B parameters) trained on 60+ million high-quality scientific documents (papers, textbooks, encyclopedias…).

Papers with Code, which open-sourced the models here, explain Galactica can “summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”

Despite being trained with less data than most other LLMs, Galactica outperforms the best ones (PaLM, Chinchilla) on scientific benchmarks (MATH, MMLU). Impressive.

Surprisingly, Galactica also surpasses BLOOM and OPT on BIG-bench despite not being trained on generic text data.

Not so surprisingly, it’s less toxic than usual (high-quality content prevents the model from learning from dubious sources).

Given Galactica’s performance, it’s understandable that Nvidia researcher Jim Fan described it as “a huge milestone:”

Jim Fan’s Tweet

If these results sound great it’s because they are. Galactica is, undoubtedly, a notable technological achievement (although a more thorough evaluation would have been appropriate).

But the model doesn’t live on paper. If it’s intended to be (a first version of) a portal to humanity’s knowledge, people should be able to trust it.

However, it’s precisely under real-world conditions — in contrast to benchmarks — when Galactica falls apart.

For instance, Yann LeCun, DL pioneer and Chief AI scientist at Meta says “type a text and Galactica will generate a paper with relevant references, formulas, and everything.”

Yann LeCun’s Tweet

This is simply false (plenty of examples below). Galactica often fails to live up to what authors and supporters’ so lightly claim.

And it does it so ubiquitously, so catastrophically, and so dangerously, that its failures greatly overshadow the technical breakthrough it’d been otherwise.

To test it yourself, you can go here. (Correction: You could try it until Meta decided to shut down the demo due to the overwhelming backlash.)

Galactica as seen by skeptics

With “skeptics” I mean those who tested Galactica (I did) with critical thinking (although not much was needed to draw conclusions). And, just to be clear, I don’t mean it with any negative connotation.

Indeed, I consider myself to be in this group. Let me show you why.

Four sentences into the abstract, the authors write this:

“In this paper we introduce Galactica: a large language model that can store, combine and reason [emphasis mine] about scientific knowledge.”

Notice the word choice to describe Galactica’s ability.

Now, contrast that claim with a chronology of what scientists and university professors have shared about the model all over Twitter (examples in the links):

“[Galactica is] a great service to paper mills, fraudulent plagiarists & cheating students everywhere.”

– Simon J Greenhill, professor at UoA

“Language models should model language, not ‘knowledge.’”

– David Chapman, AI Ph.D. at MIT

“Is this really what AI has come to, automatically mixing reality with bullshit so finely we can no longer recognize the difference?”

– Gary Marcus, Author and professor emeritus at NYU (The Road to AI We Can Trust)

“What bothers me so much about Facebook’s Galactica … is that it pretends to be a portal to knowledge … Actually it’s just a random bullshit generator.”

– Carl T. Bergstrom, biology professor at UW

“Feeling like my job as a scientist is still secure.”

– Melanie Mitchell, AI professor at the Santa Fe Institute

“Facebook (sorry: Meta) AI: Check out our “AI” that lets you access all of humanity’s knowledge. Also Facebook AI: Be careful though, it just makes shit up.”

– Emily M. Bender, linguistics professor at UW

“Maybe don’t name your model after the Encyclopedia Galactica unless it is good enough to be a trusted source of knowledge?”

– Mark Riedl, AI professor at GeorgiaTech

“I asked #Galactica about some things I know about and I’m troubled. In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.”

– Michael Black, Director at Max Plank Institute for Intelligence systems

Quite a unanimous reaction.

If you don’t want to follow those links, here’s an illustrative — and amusing — example of why they’re criticizing Galactica so hard, despite its fine benchmark results:

Hilariously, Galactica did come up with a wiki article on “Lennon-Ono complimentarity.” Credit: Andrew Sundstrom (shared by Gary Marcus)

The bottom line: Galactica is great at generating scientific-sounding made-up facts, but nothing else. That makes it a dangerous tool.

Can anyone call this reasoning?

Galactica: A cautionary tale on AI hype

Let me disentangle what’s happening here.

We have to understand and differentiate what authors — and supporters — claim Galactica can do (but can’t) from what it actually does (but shouldn’t).

One of Galactica’s alleged strengths is the capacity to reason. The word “reasoning” — which I highlighted above — appears 34 times in the paper.

However, the model does nothing of the sort. This claim is actually an overclaim given the numerous examples that show the system’s absolute lack of reasoning (that’s why they had to shut down the demo). Similar claims take the form of “Galactica will generate a paper,” or “Galactica … can generate wiki articles,” etc.

The idea is the same: researchers exaggerate AI’s abilities by exploiting semantic gaps: because “reasoning” doesn’t have a strict formal definition, they can stretch the word to fit into whatever it is that Galactica does.

Galactica in particular and language models in general are, as Gary Marcus argues, “fundamentally ill-equipped” to do these kinds of tasks. Emily M. Bender puts it bluntly:

“The only knowledge that an LLM can truthfully be said to have is information about the distribution of word forms.”

No understanding. No reasoning.

What Galactica does is generate (usually made-up) scientific-sounding text. That’s not reasoning, but the appearance of reasoning. As Michael Black said, “[Galactica] was wrong or biased but sounded right and authoritative.”

This isn’t just a matter of unreliability — i.e. you know you can’t trust a system because it may give you a correct or incorrect answer and you’d have no way to assess which one.

Galactica’s problem goes deeper because sounding “right” and “authoritative” would make anyone who doesn’t have prior formation on a topic believe with illusory certainty that the newly acquired (flawed) knowledge is true.

This makes Galactica not just wrong but dangerously wrong.

To their credit, the website has a “limitations” section in which they mention the model’s shortcomings:

Galactica’s limitations

But, is this enough? Laying out a model’s limitations has become a common practice for tech companies, but it doesn’t compensate for the lack of carefulness in testing the models’ abilities — or overclaiming.

If you set a demo and claim Galactica can reason and generate papers or articles, but it can’t, a “limitations” section isn’t sufficient to prevent the potential damage.

That damage is, ultimately (as you may have guessed), misinformation and disinformation.

In an exchange between Yann LeCun and Ernest Davis (both professors at NYU), the former explains that Galactica isn’t intended to be tested with the kinds of “quick trials” that people shared on Twitter.

Ernest Davis’ response speaks for itself:

“If the creators don’t want people to submit titles for wikis, then the demo on the home page should not invite them to submit a title for a wiki.”

That’s the moral of the story: the problem with Galactica isn’t that it can’t write a paper truthfully, reliably, or factually. The problem is that the people behind it have chosen to resort to the hype (for whatever reason).

As an analogy, a plane is a perfectly fine piece of technology, but if aeronautic engineers claimed it could take us to the Moon, then they’d be deserving of all the criticism in the world. The same happens with Galactica (and many other AI systems, not just language models).

By using these dubious practices, you get a mix of backlash from worried and angry scientists, a powerful open-source tool that can easily generate mis- and disinformation, and confused laypeople that can’t use the tool correctly due to incoherent user guidelines and are uncertain of whether they’d benefit from it or not.

This isn’t the picture we want to create for AI.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: