Show, Don't TELL: Explainable AI-Generated Text Detection

Hugging Face Daily Papers Papers

Summary

Introduces TELL, an AI-generated text detection system that provides explainable annotations alongside numerical scores, achieving competitive AUROC of 0.927 while enabling users to judge authorship based on highlighted textual indicators.

Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakes explainability from the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the "tells" by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a custom SFT dataset of domain-specific authorship annotations, and further refine the system using GRPO with curriculum learning to improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC 0.927) while natively providing annotations that explain the basis for the detector's decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem of AI-generated text detection in a human-centric perspective and paves the way for a new family of detectors that focus on native explainability.
Original Article
View Cached Full Text

Cached at: 06/03/26, 03:35 AM

Paper page - Show, Don’t TELL: Explainable AI-Generated Text Detection

Source: https://huggingface.co/papers/2605.27921

Abstract

A novel AI-generated text detection system named TELL is introduced that combines high-performance detection with native explainability by showing specific textual indicators that help users make informed judgments about authorship.

Research onAI-generated text detectionhas presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakesexplainabilityfrom the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the “tells” by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a customSFT datasetof domain-specific authorship annotations, and further refine the system usingGRPOwithcurriculum learningto improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC0.927) while natively providing annotations that explain the basis for the detector’s decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem ofAI-generated text detectionin ahuman-centric perspectiveand paves the way for a new family of detectors that focus on nativeexplainability.

View arXiv pageView PDFProject pageGitHub0Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.27921 in a model README.md to link it from this page.

Datasets citing this paper2

#### suraj-ranganath/tell-human-detectors Viewer• Updatedabout 2 hours ago • 300 • 129 #### suraj-ranganath/unified_tell_dataset Preview• Updatedabout 2 hours ago • 70

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.27921 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Findings of the Counter Turing Test: AI-Generated Text Detection

arXiv cs.CL

This paper presents findings from the Counter Turing Test shared task on AI-generated text detection, with top systems achieving perfect binary classification but significantly lower performance in model attribution, highlighting the difficulty of distinguishing outputs from different large language models.

New AI classifier for indicating AI-written text

OpenAI Blog

OpenAI has released a preliminary AI text classifier designed to help identify AI-written content, with a focus on supporting educators, journalists, and misinformation researchers. The tool comes with acknowledged limitations and is accompanied by an educational resource for teachers on ChatGPT's uses and constraints.

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

arXiv cs.CL

This paper introduces MELD, a detector for AI-generated text that uses multi-task learning with auxiliary heads for generator family, attack type, and source domain to improve robustness. MELD achieves strong performance on the RAID benchmark and maintains low false-positive rates under adversarial attacks.