Advertisement
News

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

LLMs don’t read the danger in requests if you use enough big words.
Researchers Jailbreak AI by Flooding It With Bullshit Jargon
Photo by Aedrian Salazar / Unsplash

You can trick AI chatbots like ChatGPT or Gemini into teaching you how to make a bomb or hack an ATM if you make the question complicated, full of academic jargon, and cite sources that do not exist. 

That’s the conclusion of a new paper authored by a team of researchers from Intel, Boise State University, and University of Illinois at Urbana-Champaign. The research details this new method of jailbreaking LLMs, called “Information Overload” by the researchers, and an automated system for attack they call “InfoFlood.” The paper, titled “InfoFlood: Jailbreaking Large Language Models with Information Overload” was published as a preprint.

Popular LLMs like ChatGPT, Gemini, or LLaMA have guardrails that stop them from answering some questions. ChatGPT will not, for example, tell you how to build a bomb or talk someone into suicide if you ask it in a straightforward manner. But people can “jailbreak” LLMs by asking questions the right way and circumvent those protections.

Sign up for free access to this post

Free members get access to posts like this one along with an email round-up of our week's stories.
Subscribe
Advertisement