Tag
This paper proposes Scene Abstraction, a framework for constructing structured representations of the interpretive scenes that words evoke in context, using few-shot prompting of large language models. The authors introduce COCA-Scenes, a dataset of 520 usage instances, and provide empirical evidence that scenes are reliably identifiable and align better with human interpretation than alternatives.
This paper introduces Semantic Representation Attack (SRA), a novel LLM-agnostic method that optimizes for malicious semantic representations rather than exact text, achieving high attack success rates across multiple open-source models.