@DanKornas: DeepDive is a pattern for deep search agents: synthesize QA from knowledge graphs, then train multi-turn browsing with …
Summary
DeepDive is a pattern for building deep search agents that synthesizes QA from knowledge graphs and trains multi-turn browsing with reinforcement learning (GRPO). It includes entity obfuscation and test-time scaling with tool calls.
View Cached Full Text
Cached at: 05/17/26, 07:31 AM
DeepDive is a pattern for deep search agents: synthesize QA from knowledge graphs, then train multi-turn browsing with RL.
Key Ideas: • KG random-walk data • entity obfuscation to force search • GRPO for long-horizon browsing • test-time scaling with tool calls
Repo below. https://t.co/Ud2NMhzcoA
Similar Articles
@tom_doerr: Trains deep search agents from knowledge graphs https://github.com/THUDM/DeepDive
DeepDive presents an automated approach to training deep search agents using knowledge graphs for data synthesis and multi-turn reinforcement learning, enabling complex multi-step reasoning and web browsing.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
DeepRefine is a research paper introducing an LLM-based reasoning model that refines agent-compiled knowledge bases using reinforcement learning and multi-turn interactions to improve downstream task performance.
@omarsar0: // Is Grep All You Need? // Pay attention to this on, AI devs. (bookmark it) They find that grep-style text search, whe…
A research paper from PwC finds that grep-style text search, when properly integrated into agent harnesses, can match or beat embedding-based retrieval for coding-agent tasks, suggesting vector databases may not be essential for many use cases.
Introducing deep research
OpenAI launches deep research, an agentic capability in ChatGPT powered by o3 that autonomously conducts multi-step internet research to produce comprehensive analyst-level reports, with expanded access and features as of February 2026.
Mind DeepResearch Technical Report
MindDR is a multi-agent deep research framework using a three-agent architecture (Planning, DeepSearch, Report) and a four-stage training pipeline, achieving competitive performance with ~30B-parameter models on multiple benchmarks. Developed by Li Auto and deployed as an online product, it also introduces MindDR Bench, a 500-query Chinese benchmark for evaluating deep research capabilities.