ACL-Verbatim: hallucination-free question answering for research
Summary
ACL-Verbatim introduces a family of lightweight extractive models for grounded RAG that return exact text spans from source, outperforming larger LLM-based extractors.
View Cached Full Text
Cached at: 06/02/26, 03:35 PM
Paper page - ACL-Verbatim: hallucination-free question answering for research
Source: https://huggingface.co/papers/2605.21102 Today we are releasing a new family of lightweight SOTA extractive models for grounded RAG.
Two 150M-parameter ModernBERT span extractors trained as token-classifiers. They beat public extractive baselines (Zilliz Semantic Highlight, Provence) across ACL, RAGBench, Squeez, and QASPER, and outperform LLM-based extractors 100x their size on our ACL-Verbatim benchmark.
Given a query and a retrieved chunk, the extractor returns the exact text spans that support the answer.
Rather than generating an answer with an LLM, you get verbatim evidence directly from the source: paragraphs, table captions, code blocks, or other relevant text.
Similar Articles
@neural_avb: https://x.com/neural_avb/status/2063907440509571354
Explores a common failure mode in recursive language models (RLMs) where free-text subagent responses cause issues, and presents a solution using structured outputs to improve reliability, illustrated with a long-context question-answering example from NarrativeQA.
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering
OCC-RAG introduces a family of compact small language models optimized for faithful question answering, using a novel pipeline to synthesize multi-context multi-hop QA data. The models demonstrate competitive performance against larger models on reasoning and faithfulness benchmarks.
KG-Guard: Graph-Based Hallucination Detection for Knowledge Base Question Answering
KG-Guard is a lightweight graph-based framework for detecting hallucinations in LLM-based knowledge base question answering. It treats the LLM as a black box and uses a graph encoder with a MLP classifier to identify hallucinated answer nodes, outperforming baselines while having far fewer parameters.
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation
ContextRAG introduces an extraction-free method for constructing hierarchical graph indices for retrieval-augmented generation, using Residual-Quantization K-Means and Formal Concept Analysis to reduce LLM calls and tokens by orders of magnitude while maintaining competitive F1 scores on multi-hop questions.
MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA
MARDoc is a memory-aware refinement agent framework for multimodal long document question answering, evaluated on MMLongBench-Doc and DocBench benchmarks using Qwen3-VL models, showing consistent improvements over MLLM-based, RAG-based, and agent-based baselines.