webapp

#webapp

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

arXiv cs.CL ↗ · yesterday Cached

This paper introduces RogueAI, a reverse Turing test implemented as an interactive webapp where human players interrogate two LLM agents to identify which one is licensed to deceive within a shared fictional scenario. A pilot deployment shows a gap between heuristic detection (75.6% accuracy) and human performance (56.6%), highlighting the potential of the system as a data-collection and teaching tool for AI deception and honesty.

0 favorites 0 likes

webapp

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

Submit Feedback