DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
Summary
DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.
View Cached Full Text
Cached at: 04/23/26, 03:35 AM
Paper page - DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
Source: https://huggingface.co/papers/2604.19859 Published on Apr 21
#2 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
Abstract
DR-Venus-4B is a 4-billion-parameter deep research agent trained entirely on open data using agentic supervised fine-tuning and reinforcement learning with turn-level rewards to achieve superior performance on research benchmarks while maintaining edge-scale deployment advantages.
Edge-scaledeep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong smalldeep research agentunder limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4Bdeep research agentforedge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we useagentic supervised fine-tuning(SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we applyagentic reinforcement learning(RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and designturn-level rewardsbased oninformation gainandformat-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value oftest-time scalingin this setting. We release our models, code, and key recipes to support reproducible research on edge-scaledeep research agents.
View arXiv pageView PDFProject pageGitHub10Add to collection
Get this paper in your agent:
hf papers read 2604\.19859
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.19859 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.19859 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.19859 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
@Ex0byt: A must bookmark.. tiny cracked team, 4 H100 nodes, open source 3 stage recipe, trained on 8k synthetic rubric tasks, fu…
A small team trained a frontier-level Deep Research Agent on an academic budget using only 32 H100s and 8K synthetic samples, releasing fully open weights, code, and paper for models from 2B to 35B that match or beat closed frontier agents on key benchmarks.
S1-DeepResearch: Beyond Search, Toward Real-World Long-Horizon Research Agents
This paper introduces S1-DeepResearch-32B, an open-source model and 15K trajectory dataset for deep research agents, achieving state-of-the-art performance across 20 benchmarks by jointly modeling information acquisition, knowledge synthesis, and planning.
@KaiZhang_CS: Check out one of the best open-source search agents trained by @jianxie_ !! glad to see early experience methods work o…
Yu Su's team trained a frontier Deep Research Agent on an academic budget using 8K synthetic samples and RL, releasing fully open training infrastructure and models from 2B to 35B parameters.
Mind DeepResearch Technical Report
MindDR is a multi-agent deep research framework using a three-agent architecture (Planning, DeepSearch, Report) and a four-stage training pipeline, achieving competitive performance with ~30B-parameter models on multiple benchmarks. Developed by Li Auto and deployed as an online product, it also introduces MindDR Bench, a 500-query Chinese benchmark for evaluating deep research capabilities.
Researchers trained a Deep Research agent with 32 H100s and open-sourced everything
Researchers trained a Deep Research agent using 32 H100 GPUs and open-sourced all components, enabling community access and further development.