SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Hugging Face Daily Papers 05/28/26, 12:00 AM Papers

self-aware reinforcement-learning agentic-search llm question-answering over-search reward-hacking

Summary

SAAS introduces a reinforcement learning framework that enhances agent self-awareness to reduce unnecessary searches in LLM-based question answering systems, balancing accuracy and computational cost.

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.

Original Article

View Cached Full Text

Cached at: 06/01/26, 11:20 AM

Paper page - SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Source: https://huggingface.co/papers/2605.29796

Abstract

SAAS introduces a reinforcement learning framework that enhances agent self-awareness to reduce unnecessary searches in LLM-based question answering systems.

Agentic searchenablesLLMsto solve complexmulti-hop questionsthroughiterative reasoningandexternal search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack ofself-awarenessleads to severeover-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novelRL frameworkdesigned to cultivate dynamicself-awarenessthat precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) asearch boundary modelingmechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) aboundary-aware reward module, which translates this boundary awareness intotrajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) astage-wise optimizationstrategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoidingreward hacking. Extensive experiments demonstrate that SAAS substantially reducesover-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.

View arXiv page View PDF GitHub5 Add to collection

Get this paper in your agent:

hf papers read 2605\.29796

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.29796 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.29796 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.29796 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Paper page - SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

GRASP: GRanularity-Aware Search Policy for Agentic RAG

Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

Submit Feedback

Similar Articles

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

GRASP: GRanularity-Aware Search Policy for Agentic RAG

Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation