OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Hugging Face Daily Papers 05/06/26, 12:00 AM Papers

multimodal search-agents reinforcement-learning open-source vision-language deep-search grpo

Summary

OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/08/26, 08:07 AM

Paper page - OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Source: https://huggingface.co/papers/2605.05185

Abstract

OpenSearch-VL presents an open-source framework for training advanced multimodal search agents using reinforcement learning, featuring specialized data curation, diverse tool environments, and a novel training algorithm that improves performance across multiple benchmarks.

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tiermultimodal search agentsremain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents withagentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data throughWikipedia path sampling,fuzzy entity rewriting, andsource-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k forSFTand SearchVL-RL-8k forRL. Besides, we design a diversetool environmentthat unifies text search, image search,OCR,cropping,sharpening,super-resolution, andperspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-awareGRPOtraining algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sidedadvantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

View arXiv page View PDF Project page GitHub69 Add to collection

Get this paper in your agent:

hf papers read 2605\.05185

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper3

#### OpenSearch-VL/OpenSearch-VL-8B 770k• Updated1 day ago • 33 • 2 #### OpenSearch-VL/OpenSearch-VL-30B-A3B #### OpenSearch-VL/OpenSearch-VL-32B 1.14M• Updated1 day ago

Datasets citing this paper2

#### OpenSearch-VL/Search-VL-SFT-36K Preview• Updated1 day ago • 187 • 3 #### OpenSearch-VL/Search-VL-RL-8K Updated1 day ago • 48 • 2

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.05185 in a Space README.md to link it from this page.

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Paper page - OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Abstract

Models citing this paper3

Datasets citing this paper2

Spaces citing this paper0

Collections including this paper3

Similar Articles

@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis

SkillOS: Learning Skill Curation for Self-Evolving Agents

Variational option discovery algorithms

Submit Feedback

Similar Articles

@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis

SkillOS: Learning Skill Curation for Self-Evolving Agents

Variational option discovery algorithms