LLM03: Training Data Poisoning - OWASP Top 10 for LLM & Generative AI Security
Reading the examples of LLM vulnerability through the lens of Yochai Benkler's description of a "propaganda feedback loop" makes it clear that AI can aggressively weaponize disinformation campaigns already benefiting from a pre-existing media environment dynamic
A malicious actor is able to perform direct injection of falsified, biased or harmful content into the training processes of a model which is returned in subsequent outputs.
An unsuspecting user is indirectly injecting sensitive or proprietary data into the training processes of a model which is returned in subsequent outputs.
A malicious actor or competitor intentionally creates inaccurate or malicious documents which are targeted at a model’s training data in which is training the model at the same time based on inputs. The victim model trains using this falsified information which is reflected in outputs of generative AI prompts to it’s consumers.
Poisoning Web-Scale Training Datasets is Practical
Although the research is based in computer science of LLMs, history educators with experience of persistent myths, marginalized voices and manipulated narratives can see the implications of the capacity of data to be manipulated in this way