Tag
This paper introduces a dual-layer caption poisoning attack on retrieval-augmented text-to-music systems, showing that an attacker can inject malicious captions into the knowledge database to steer generated music toward attacker-chosen intent without modifying user prompts or models.
This paper introduces open-book benign rewriting (OBBR) as a proactive defense against backdoor attacks on LLMs, showing it neutralizes harmful content by projecting to benign prompts, and improves safety by 51% over state-of-the-art defenses.
AI tarpits are tools used by content creators to poison large language models by feeding scrapers useless or incorrect data, degrading AI output quality.
This paper presents a comprehensive analysis of the Neural Tangent Generalization Attack (NTGA) for data protection, including a taxonomy of related attacks, and discusses future research directions.
This paper introduces Paraesthesia, a dynamic backdoor attack on LLMs that uses emotional style as a stealthy trigger during fine-tuning, achieving high success rates while maintaining model utility.