OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
Summary
OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.
View Cached Full Text
Cached at: 05/08/26, 08:07 AM
Paper page - OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
Source: https://huggingface.co/papers/2605.05185
Abstract
OpenSearch-VL presents an open-source framework for training advanced multimodal search agents using reinforcement learning, featuring specialized data curation, diverse tool environments, and a novel training algorithm that improves performance across multiple benchmarks.
Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tiermultimodal search agentsremain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents withagentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data throughWikipedia path sampling,fuzzy entity rewriting, andsource-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k forSFTand SearchVL-RL-8k forRL. Besides, we design a diversetool environmentthat unifies text search, image search,OCR,cropping,sharpening,super-resolution, andperspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-awareGRPOtraining algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sidedadvantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.
View arXiv pageView PDFProject pageGitHub69Add to collection
Get this paper in your agent:
hf papers read 2605\.05185
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper3
#### OpenSearch-VL/OpenSearch-VL-8B 770k• Updated1 day ago • 33 • 2
#### OpenSearch-VL/OpenSearch-VL-30B-A3B
#### OpenSearch-VL/OpenSearch-VL-32B 1.14M• Updated1 day ago
Datasets citing this paper2
#### OpenSearch-VL/Search-VL-SFT-36K Preview• Updated1 day ago • 187 • 3 #### OpenSearch-VL/Search-VL-RL-8K Updated1 day ago • 48 • 2
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.05185 in a Space README.md to link it from this page.
Collections including this paper3
Similar Articles
@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…
OpenSeeker fully open-sources training data and models for 30B-scale ReAct-based search agents, achieving state-of-the-art performance on multiple benchmarks including BrowseComp and Humanity's Last Exam. It is the first purely academic project to reach frontier search benchmark performance while releasing complete training data.
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
This paper introduces an auto-research framework using specialist agents to iteratively refine training recipes through an empirical loop of code execution and feedback. The system autonomously improves performance on tasks like Parameter Golf and NanoChat without human intervention by leveraging lineage feedback.
FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis
This paper introduces FoodCHA, a multi-modal LLM agent framework designed for fine-grained food analysis, addressing challenges in hierarchical consistency and attribute discrimination for dietary monitoring.
SkillOS: Learning Skill Curation for Self-Evolving Agents
This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.
Variational option discovery algorithms
OpenAI researchers introduce VALOR, a variational inference method for option discovery that connects option learning to variational autoencoders, and propose a curriculum learning approach that stabilizes training by dynamically increasing context complexity.