Show, Don't TELL: Explainable AI-Generated Text Detection

Hugging Face Daily Papers 05/27/26, 12:00 AM Papers

aigenerated-text-detection explainability grpo curriculum-learning sft-dataset human-centric

Summary

Introduces TELL, an AI-generated text detection system that provides explainable annotations alongside numerical scores, achieving competitive AUROC of 0.927 while enabling users to judge authorship based on highlighted textual indicators.

Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakes explainability from the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the "tells" by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a custom SFT dataset of domain-specific authorship annotations, and further refine the system using GRPO with curriculum learning to improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC 0.927) while natively providing annotations that explain the basis for the detector's decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem of AI-generated text detection in a human-centric perspective and paves the way for a new family of detectors that focus on native explainability.

Original Article

View Cached Full Text

Cached at: 06/03/26, 03:35 AM

Paper page - Show, Don’t TELL: Explainable AI-Generated Text Detection

Source: https://huggingface.co/papers/2605.27921

Abstract

A novel AI-generated text detection system named TELL is introduced that combines high-performance detection with native explainability by showing specific textual indicators that help users make informed judgments about authorship.

Research onAI-generated text detectionhas presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakesexplainabilityfrom the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the “tells” by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a customSFT datasetof domain-specific authorship annotations, and further refine the system usingGRPOwithcurriculum learningto improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC0.927) while natively providing annotations that explain the basis for the detector’s decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem ofAI-generated text detectionin ahuman-centric perspectiveand paves the way for a new family of detectors that focus on nativeexplainability.

View arXiv page View PDF Project page GitHub0 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.27921 in a model README.md to link it from this page.

Datasets citing this paper2

#### suraj-ranganath/tell-human-detectors Viewer• Updatedabout 2 hours ago • 300 • 129 #### suraj-ranganath/unified_tell_dataset Preview• Updatedabout 2 hours ago • 70

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.27921 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Show, Don't TELL: Explainable AI-Generated Text Detection

Paper page - Show, Don’t TELL: Explainable AI-Generated Text Detection

Abstract

Models citing this paper0

Datasets citing this paper2

Spaces citing this paper0

Collections including this paper0

Similar Articles

Findings of the Counter Turing Test: AI-Generated Text Detection

People are able to detect AI generated text 75% of time

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

New AI classifier for indicating AI-written text

Submit Feedback

Similar Articles

Findings of the Counter Turing Test: AI-Generated Text Detection

People are able to detect AI generated text 75% of time

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

New AI classifier for indicating AI-written text