Why Data Poisoning Corrupts the “Core of AI”
Data poisoning isn’t limited to traditional AI systems. It also applies to retrieval-augmented generation models, where AI’s responses are enhanced with real-world data.
In RAG, if the training data or real-time data sources get poisoned, it can lead to manipulated outputs that seem legitimate because they’re backed by “real” data.
Researchers at the University of Texas at Austin demonstrated just how vulnerable RAG systems can be when their data streams are compromised. Researchers there uncovered a new class of security vulnerabilities known as ConfusedPilot, in which data streams within Microsoft 365, Copilot and other RAG-based systems can be compromised, wreaking havoc on daily office tasks.
This kind of falsified information can negatively impact decision-making and have longer-lasting effects on the enterprise because once erroneous data infiltrates the system, it can be difficult to extract.
RELATED: Why is a cyber resilience strategy essential for business success?
What Are the Symptoms of Data Poisoning?
According to Glenn, one of the first signs of data poisoning is unexpected or nonsensical outputs from an AI system.
“You might call them hallucinations,” Glenn says. “Essentially, what you see is information that is not correct. It just doesn’t make sense.”
For example, an AI chatbot trained with poisoned data might generate outdated, irrelevant or outright wrong responses. These symptoms often surface when customers interact with the system and notice discrepancies, she says.
“They’ll report that the information doesn’t match what they’ve received in the past or that it feels wrong somehow,” Glenn adds.