Show, Don't TELL: Explainable AI-Generated Text Detection
Summary
Introduces TELL, an AI-generated text detection system that provides explainable annotations alongside numerical scores, achieving competitive AUROC of 0.927 while enabling users to judge authorship based on highlighted textual indicators.
View Cached Full Text
Cached at: 06/03/26, 03:35 AM
Paper page - Show, Don’t TELL: Explainable AI-Generated Text Detection
Source: https://huggingface.co/papers/2605.27921
Abstract
A novel AI-generated text detection system named TELL is introduced that combines high-performance detection with native explainability by showing specific textual indicators that help users make informed judgments about authorship.
Research onAI-generated text detectionhas presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakesexplainabilityfrom the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the “tells” by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a customSFT datasetof domain-specific authorship annotations, and further refine the system usingGRPOwithcurriculum learningto improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC0.927) while natively providing annotations that explain the basis for the detector’s decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem ofAI-generated text detectionin ahuman-centric perspectiveand paves the way for a new family of detectors that focus on nativeexplainability.
View arXiv pageView PDFProject pageGitHub0Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.27921 in a model README.md to link it from this page.
Datasets citing this paper2
#### suraj-ranganath/tell-human-detectors Viewer• Updatedabout 2 hours ago • 300 • 129 #### suraj-ranganath/unified_tell_dataset Preview• Updatedabout 2 hours ago • 70
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.27921 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Findings of the Counter Turing Test: AI-Generated Text Detection
This paper presents findings from the Counter Turing Test shared task on AI-generated text detection, with top systems achieving perfect binary classification but significantly lower performance in model attribution, highlighting the difficulty of distinguishing outputs from different large language models.
A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models
A large-scale empirical study analyzes 284 linguistic features across 27 LLMs and 10 text domains to assess which features reliably detect AI-generated text. The study finds that lexical richness measures are the most robust cross-domain and cross-model signals, while many other proposed indicators are strongly context-dependent.
AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
AEyeDE is an attention-based attribution framework that uses a proxy Transformer model to extract attention maps from text and trains a lightweight CNN to distinguish human-written from AI-generated text, outperforming text-only baselines and showing robustness across settings.
New AI classifier for indicating AI-written text
OpenAI has released a preliminary AI text classifier designed to help identify AI-written content, with a focus on supporting educators, journalists, and misinformation researchers. The tool comes with acknowledged limitations and is accompanied by an educational resource for teachers on ChatGPT's uses and constraints.
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
This paper introduces MELD, a detector for AI-generated text that uses multi-task learning with auxiliary heads for generator family, attack type, and source domain to improve robustness. MELD achieves strong performance on the RAID benchmark and maintains low false-positive rates under adversarial attacks.