Tag
This paper introduces RogueAI, a reverse Turing test implemented as an interactive webapp where human players interrogate two LLM agents to identify which one is licensed to deceive within a shared fictional scenario. A pilot deployment shows a gap between heuristic detection (75.6% accuracy) and human performance (56.6%), highlighting the potential of the system as a data-collection and teaching tool for AI deception and honesty.
Introduces DECOR, a multi-agent framework grounded in Information Manipulation Theory for fine-grained auditing of strategic deception in LLM responses, achieving state-of-the-art performance on deception detection benchmarks across 15 frontier models.