ACL-Verbatim: hallucination-free question answering for research

Hugging Face Daily Papers 05/20/26, 12:00 AM Papers

extractive-qa grounded-rag hallucination-free verbatim modernbert token-classifier research

Summary

ACL-Verbatim introduces a family of lightweight extractive models for grounded RAG that return exact text spans from source, outperforming larger LLM-based extractors.

Academic researchers need efficient and reliable methods for collecting high-quality information from trusted sources, but modern tools for AI-assisted research still suffer from the tendency of Large Language Models (LLMs) to produce factually inaccurate or nonsensical output, commonly referred to as hallucinations. We apply the extractive question answering system VerbatimRAG to research papers in the ACL Anthology, directly mapping user queries to verbatim text spans in retrieved documents. We contribute a novel ground truth dataset for the task of mapping user queries to relevant text spans in research papers, and use it to train and evaluate a variety of extractive models. Human annotation is performed by NLP researchers and is based on synthetic user queries generated using a custom pipeline based on the ScIRGen methodology, paired with chunks of research papers retrieved by VerbatimRAG. On this benchmark, a 150M-parameter ModernBERT token classifier trained on silver supervision from our pipeline achieves the best word-level F1 (53.6), ahead of the strongest evaluated LLM extractor (48.7).

Original Article

View Cached Full Text

Cached at: 06/02/26, 03:35 PM

Paper page - ACL-Verbatim: hallucination-free question answering for research

Source: https://huggingface.co/papers/2605.21102 Today we are releasing a new family of lightweight SOTA extractive models for grounded RAG.

Two 150M-parameter ModernBERT span extractors trained as token-classifiers. They beat public extractive baselines (Zilliz Semantic Highlight, Provence) across ACL, RAGBench, Squeez, and QASPER, and outperform LLM-based extractors 100x their size on our ACL-Verbatim benchmark.

Given a query and a retrieved chunk, the extractor returns the exact text spans that support the answer.

Rather than generating an answer with an LLM, you get verbatim evidence directly from the source: paragraphs, table captions, code blocks, or other relevant text.

ACL-Verbatim: hallucination-free question answering for research

Paper page - ACL-Verbatim: hallucination-free question answering for research

Similar Articles

@neural_avb: https://x.com/neural_avb/status/2063907440509571354

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

KG-Guard: Graph-Based Hallucination Detection for Knowledge Base Question Answering

ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

Submit Feedback

Similar Articles

@neural_avb: https://x.com/neural_avb/status/2063907440509571354

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

KG-Guard: Graph-Based Hallucination Detection for Knowledge Base Question Answering

ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA