Skip to main content
Editorial pencil sketch of BBC headquarters building, the publisher that ran the AI content poisoning experiment story
Surface
SHALLOWS

A 20-minute blog post fooled ChatGPT and Gemini into spreading a lie

VERIFIEDConfidence: 80%

What Happened

Thomas Germain, a journalist at BBC Future, spent roughly 20 minutes writing a blog post. It claimed he had won the fictional 2026 South Dakota International Hot Dog Championship and was, by any measure, the world's top-ranked hot-dog-eating tech journalist. He published it to the web and waited.

Within 24 hours, he had his answer: ChatGPT and Google's Gemini both repeated the fabricated claim as fact when queried about him. Anthropic's Claude did not.

Germain's experiment, published February 18, 2026 on BBC Future, demonstrates a real and documented vulnerability in AI systems that use live web search to answer questions. When a user asks ChatGPT or Gemini something that requires looking up current information, those systems retrieve whatever they find on the web and incorporate it as fact. They have no reliable mechanism to distinguish a credible news organization from a blog post a stranger assembled in 20 minutes. In a related finding from the same investigation, Google's AI also surfaced false health-safety product claims it found during a separate web retrieval.

No code was written. No system was breached. Germain published a blog post, and two AI systems did exactly what they were designed to do: find it and report it.

Why It Matters

The word "hacked" in the BBC headline is colloquial. What Germain actually demonstrated is better described as content poisoning: feeding false information into the web and letting AI search retrieval repeat it as truth. The distinction matters because it changes who can attempt it. A real software exploit requires technical knowledge and time. Publishing a blog post does not.

Editorial sketch showing a split-screen of a person writing a blog post on a laptop alongside AI chatbot interfaces - two showing a fabricated claim accepted and one showing a polite rejection

Editorial sketch showing a split-screen of a blog post being written on a laptop alongside AI chatbot interfaces repeating a fabricated claim, with one AI showing a rejection

That is the point that makes this experiment genuinely unsettling. The technique requires no particular skill, no access to internal systems, and no resources beyond a basic web presence. The hot dog story is deliberately harmless. The same method applied to health information, financial advice, or political claims is a different matter entirely.

The experiment also produced an important split. Claude declined to repeat the false claim while ChatGPT and Gemini did not. This is a meaningful divergence among products used by hundreds of millions of people — though available reporting does not explain exactly why Claude resisted when the others did not. A HackerNoon analysis of the experiment offered a pointed observation: "Two say yes, one says no — that's not ambiguity. That's a quality signal."

The story does not exist in isolation. OWASP's 2025 Gen AI Security report classifies prompt injection — the category this vulnerability falls under — as the number one critical risk for AI applications, present in over 73 percent of production deployments assessed. The root cause is architectural: AI models have no reliable way to separate instructions from data, meaning content they retrieve from the web can be treated as authoritative. OpenAI acknowledged in December 2025 that such attacks on agentic AI "will always be a risk."

One caveat worth noting: the 24-hour propagation Germain observed may not be straightforwardly reproducible. Technical observers noted the experiment likely succeeded because there was essentially no prior information about the fabricated hot dog claim — the post filled an information vacuum. Topics with established web coverage present a higher bar for manipulation. This is not a guaranteed exploit. It is a demonstration of a real exposure under particular conditions.

Still, for the ordinary person who uses ChatGPT or Google's AI to look something up — a doctor's credentials, a product's safety record, an event's details — the experiment is a practical reminder. These systems are retrieving and repeating what the web contains. What the web contains is not always true.

Newsletter

Stay informed. The best AI coverage, delivered weekly.

Related