Tag
Visual-Seeker proposes a visual-native multimodal deep search agent that actively reasons over fine-grained visual details and synthesizes multimodal evidence, achieving state-of-the-art performance on five challenging multimodal search benchmarks.
TreeSeeker is an inference-time framework that organizes deep search as branch-and-return over tree-structured states, using textual UCB signals to balance exploitation, exploration, and pruning. It outperforms strong baselines on deep search benchmarks, showing that explicit branch-and-return control improves multi-step web search.
DeepDive is a pattern for building deep search agents that synthesizes QA from knowledge graphs and trains multi-turn browsing with reinforcement learning (GRPO). It includes entity obfuscation and test-time scaling with tool calls.
DeepDive presents an automated approach to training deep search agents using knowledge graphs for data synthesis and multi-turn reinforcement learning, enabling complex multi-step reasoning and web browsing.
Introduces MultiSearch, an RL-based framework that generates multiple queries at each reasoning step and explicitly merges retrieved information to improve signal-to-noise ratio and reasoning accuracy in question-answering tasks.
Google is expanding its new AI-powered Google Finance service to Europe, featuring enhanced AI research, advanced charting visualizations, and live earnings insights with local language support.
This paper introduces On-Policy Data Evolution (ODE) and a visual-native agent harness to improve multimodal deep search agents. By enabling reusable visual evidence and closed-loop data generation, ODE significantly boosts the performance of Qwen3-VL agents across multiple benchmarks, surpassing Gemini 2.5 Pro.
OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.