webapp

Tag

Cards List
#webapp

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

arXiv cs.CL · yesterday Cached

This paper introduces RogueAI, a reverse Turing test implemented as an interactive webapp where human players interrogate two LLM agents to identify which one is licensed to deceive within a shared fictional scenario. A pilot deployment shows a gap between heuristic detection (75.6% accuracy) and human performance (56.6%), highlighting the potential of the system as a data-collection and teaching tool for AI deception and honesty.

0 favorites 0 likes
← Back to home

Submit Feedback