Dec 20 2024
Security

What Is Data Poisoning, and How Can You Prevent It?

Adversarial artificial intelligence cyberattacks are on the rise. When trained data sets and AI models are manipulated, it can wreak havoc on an organization.

According to CrowdStrike, “data poisoning is a type of cyberattack in which an adversary intentionally compromises a training data set used by an AI or machine learning model to influence or manipulate the operation of that model.”

Experts say this it can have serious repercussions for organizations, resulting in large amounts of false information. In the case of generative artificial intelligence chats, it can result in AI providing the wrong answers.

Here’s a full rundown of what data poisoning means, the risks and how to prevent it in your organization.

What Is Data Poisoning?

Jennifer Glenn, research director for IDC’s security and trust group, says data poisoning is an attack on the data sets that train AI models.

“The injected or modified data affects the output of the model,” she says.

This process can involve introducing misleading, incorrect or adversarial data into the training pipeline. The goal is to subtly, or sometimes overtly, undermine the AI model’s accuracy and reliability.

Click the banner below to read the 2024 CDW Cybersecurity Research Report.

 

Why Data Poisoning Corrupts the “Core of AI”

Data poisoning isn’t limited to traditional AI systems. It also applies to retrieval-augmented generation models, where AI’s responses are enhanced with real-world data.

In RAG, if the training data or real-time data sources get poisoned, it can lead to manipulated outputs that seem legitimate because they’re backed by “real” data. 

Researchers at the University of Texas at Austin demonstrated just how vulnerable RAG systems can be when their data streams are compromised. Researchers there uncovered a new class of security vulnerabilities known as ConfusedPilot, in which data streams within Microsoft 365, Copilot and other RAG-based systems can be compromised, wreaking havoc on daily office tasks.

This kind of falsified information can negatively impact decision-making and have longer-lasting effects on the enterprise because once erroneous data infiltrates the system, it can be difficult to extract.

RELATED: Why is a cyber resilience strategy essential for business success?

What Are the Symptoms of Data Poisoning?

According to Glenn, one of the first signs of data poisoning is unexpected or nonsensical outputs from an AI system.

“You might call them hallucinations,” Glenn says. “Essentially, what you see is information that is not correct. It just doesn’t make sense.”

For example, an AI chatbot trained with poisoned data might generate outdated, irrelevant or outright wrong responses. These symptoms often surface when customers interact with the system and notice discrepancies, she says.

“They’ll report that the information doesn’t match what they’ve received in the past or that it feels wrong somehow,” Glenn adds.

Mike Spisak
It’s a common form of data poisoning where attackers alter training data set labels, such as marking nonspam emails as spam, causing models to misclassify data.”

Mike Spisak Managing Director of Proactive Security, Unit 42, Palo Alto Networks

What Are the Different Types of Data Poisoning Attacks?

There are different types of data poisoning attacks, and the trick is to know the signs before they escalate. A common “entry level’ type of attack is known as “label flipping,” says Mike Spisak, managing director of proactive security at Unit 42 at Palo Alto Networks

“It’s a common form of data poisoning where attackers alter training data set labels, such as marking nonspam emails as spam, causing models to misclassify data,” he says.

Other common types include backdoor attacks, which inject specific triggers into AI training data to manipulate outputs, and false data injections, which create intentional errors that can disrupt training data and decision-making processes.

“GenAI has opened up the technology for everyone, including less-skilled threat actors, making these attack surfaces more accessible,” Spisak adds.

DIG DEEPER: How can teams prevent a “man in the middle” attack?

How Can You Prevent Data Poisoning?

For starters, a solid data governance framework can reduce your chances of a data poisoning attack, as it helps teams oversee the collection, storage, usage and sharing of data.

“By systematically overseeing AI systems, data governance maintains that the data used to train AI models is accurate, reliable and helps mitigate bias,” writes Wendi O’Neill, senior director of CDW’s data and analytics presales team, in a blog.

Data is the fuel that drives the AI engine, but this symbiotic relationship requires that teams working with the data have degree of data literacy. “For AI to be used effectively, organizations must educate all stakeholders about the organization’s data: its origin, location and how it is being generated,” O’Neill writes.

UP NEXT: Training your staff can help you maximize tech investments.

With data governance in place, IT leaders can also run rigorous data validation and continuous monitoring. Leveraging advanced threat detection technologies can provide an additional layer of protection against sophisticated poisoning attempts.

Organizations should also have strong validation techniques in place and install strict access controls so no one can tamper with training data sets.

Experts also said adversarial training is key, as it teaches your AI systems to recognize and resist attacks. It’s also a good way to educate your team on the security risks of AI.

iStock / Getty Images Plus
Close

See How Your Peers Are Moving Forward in the Cloud

New research from CDW can help you build on your success and take the next step.