DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
Summary
DeepRefine is a research paper introducing an LLM-based reasoning model that refines agent-compiled knowledge bases using reinforcement learning and multi-turn interactions to improve downstream task performance.
View Cached Full Text
Cached at: 05/13/26, 12:20 AM
Paper page - DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
Source: https://huggingface.co/papers/2605.10488
Abstract
DeepRefine is an LLM-based reasoning model that refines agent-compiled knowledge bases through multi-turn interactions and targeted updates to improve downstream task performance.
Agent-compiled knowledge basesprovide persistent external knowledge for large language model (LLM) agents in open-ended, knowledge-intensive downstream tasks. Yet their quality is systematically limited by incompleteness, incorrectness, and redundancy, manifested as missing evidence or cross-document links, low-confidence or imprecise claims, and ambiguous or coreference resolution issues. Such defects compound under iterative use, degrading retrieval fidelity and downstream task performance. We present DeepRefine, a general LLM-based reasoning model for agent-compiledknowledge refinementthat improves the quality of any pre-constructed knowledge bases with user queries to make it more suitable for the downstream tasks. DeepRefine performsmulti-turn interactionswith the knowledge base and conductsabductive diagnosisover interaction history, localizes likely defects, and executes targeted refinement actions for incremental knowledge base updates. To optimize refinement policies of DeepRefine without gold references, we introduce a Gain-Beyond-Draft (GBD) reward and train the reasoning process end-to-end viareinforcement learning. Extensive experiments demonstrate consistent downstream gains over strong baselines.
View arXiv pageView PDFGitHub2Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.10488 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.10488 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.10488 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
This paper investigates whether reinforcement learning can improve the direct recall of parametric knowledge in LLMs beyond reasoning tasks. It demonstrates that RL with binary rewards yields significant gains in factual QA benchmarks by redistributing probability mass to unlock latent knowledge rather than acquiring new facts.
OThink-SRR1: Search, Refine and Reasoning with Reinforced Learning for Large Language Models
OThink-SRR1 introduces an iterative Search-Refine-Reason framework trained with GRPO-IR reinforcement learning to reduce retrieval noise and token costs while boosting multi-hop QA accuracy.
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
This paper introduces Deep Reasoning, an inference-time approach that uses structured meta-reasoning to construct task-specific scaffolds for general-purpose agents. The proposed agent, Dolores, outperforms existing methods by distributing cognition across lower-load reasoning threads, reducing hallucinations and improving performance across multiple benchmarks.
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
This paper introduces RubricEM, a reinforcement learning framework that uses rubric-guided policy decomposition and reflection-based meta-policy evolution to train deep research agents for long-form tasks. The resulting RubricEM-8B model demonstrates strong performance on long-form research benchmarks by leveraging stage-aware planning and denser semantic feedback.
@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…
A comprehensive blog post reviewing the state of reinforcement learning for reasoning LLMs, covering methods from REINFORCE and PPO to GRPO and beyond, with connections to key models like InstructGPT and DeepSeek-R1.