Tag
This paper introduces Semantic Representation Attack (SRA), a novel LLM-agnostic method that optimizes for malicious semantic representations rather than exact text, achieving high attack success rates across multiple open-source models.