Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

long-horizon-search observation-masking context-management agentic-search retrieval model-capacity regime-map

Summary

This paper studies observation masking in long-horizon search agents, finding that accuracy gains follow an asymmetric inverted-U shape depending on the interplay between retriever capability and model capacity, with a collapse when the model is saturated. It provides a mechanistic analysis and a regime map for context management.

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.

Original Article

View Cached Full Text

Cached at: 06/02/26, 03:23 AM

Paper page - Masking Stale Observations Helps Search Agents – Until It Doesn’t: A Regime Map and Its Mechanism

Source: https://huggingface.co/papers/2606.00408

Abstract

Observation masking in long-horizon search agents shows variable accuracy gains depending on the interaction between retriever capability and model capacity, following an asymmetric inverted-U pattern.

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form ofcontext managementhelps and why. We studyobservation maskingthrough a systematic sweep over variousagent backbones(4B to 284B parameters) and threeretrieverson offline and live-webagentic searchbenchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model’s accuracy withoutcontext management: a plateau under weakretrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model’simplicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements atoken-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframecontext managementas a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.

View arXiv page View PDF GitHub0 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.00408 in a model README.md to link it from this page.

Datasets citing this paper1

#### i-DeepSearch/observation-masking-eval-logs Preview• Updated37 minutes ago • 548 • 1

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.00408 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Paper page - Masking Stale Observations Helps Search Agents – Until It Doesn’t: A Regime Map and Its Mechanism

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

Search Discipline for Long-Horizon Research Agents

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

@omarsar0: // The Memory Curse in LLM Agents // (bookmark it) Long histories apparently degrades agents as they become increasingl…

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Constraint-Enhanced Physical Search through Correlation Matching

Submit Feedback

Similar Articles

Search Discipline for Long-Horizon Research Agents

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

@omarsar0: // The Memory Curse in LLM Agents // (bookmark it) Long histories apparently degrades agents as they become increasingl…

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Constraint-Enhanced Physical Search through Correlation Matching