OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Hugging Face Daily Papers Papers

Summary

OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/08/26, 08:07 AM

Paper page - OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Source: https://huggingface.co/papers/2605.05185

Abstract

OpenSearch-VL presents an open-source framework for training advanced multimodal search agents using reinforcement learning, featuring specialized data curation, diverse tool environments, and a novel training algorithm that improves performance across multiple benchmarks.

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tiermultimodal search agentsremain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents withagentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data throughWikipedia path sampling,fuzzy entity rewriting, andsource-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k forSFTand SearchVL-RL-8k forRL. Besides, we design a diversetool environmentthat unifies text search, image search,OCR,cropping,sharpening,super-resolution, andperspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-awareGRPOtraining algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sidedadvantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

View arXiv pageView PDFProject pageGitHub69Add to collection

Get this paper in your agent:

hf papers read 2605\.05185

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper3

#### OpenSearch-VL/OpenSearch-VL-8B 770k• Updated1 day ago • 33 • 2 #### OpenSearch-VL/OpenSearch-VL-30B-A3B #### OpenSearch-VL/OpenSearch-VL-32B 1.14M• Updated1 day ago

Datasets citing this paper2

#### OpenSearch-VL/Search-VL-SFT-36K Preview• Updated1 day ago • 187 • 3 #### OpenSearch-VL/Search-VL-RL-8K Updated1 day ago • 48 • 2

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.05185 in a Space README.md to link it from this page.

Collections including this paper3

Similar Articles

SkillOS: Learning Skill Curation for Self-Evolving Agents

Hugging Face Daily Papers

This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.

Variational option discovery algorithms

OpenAI Blog

OpenAI researchers introduce VALOR, a variational inference method for option discovery that connects option learning to variational autoencoders, and propose a curriculum learning approach that stabilizes training by dynamically increasing context complexity.