DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Hugging Face Daily Papers 04/21/26, 12:00 AM Papers

Summary

DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.

Original Article

View Cached Full Text

Cached at: 04/23/26, 03:35 AM

Paper page - DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Source: https://huggingface.co/papers/2604.19859 Published on Apr 21

#2 Paper of the day Authors:

Abstract

DR-Venus-4B is a 4-billion-parameter deep research agent trained entirely on open data using agentic supervised fine-tuning and reinforcement learning with turn-level rewards to achieve superior performance on research benchmarks while maintaining edge-scale deployment advantages.

Edge-scaledeep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong smalldeep research agentunder limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4Bdeep research agentforedge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we useagentic supervised fine-tuning(SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we applyagentic reinforcement learning(RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and designturn-level rewardsbased oninformation gainandformat-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value oftest-time scalingin this setting. We release our models, code, and key recipes to support reproducible research on edge-scaledeep research agents.

View arXiv page View PDF Project page GitHub10 Add to collection

Get this paper in your agent:

hf papers read 2604\.19859

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.19859 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.19859 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.19859 in a Space README.md to link it from this page.

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Paper page - DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

@Ex0byt: A must bookmark.. tiny cracked team, 4 H100 nodes, open source 3 stage recipe, trained on 8k synthetic rubric tasks, fu…

S1-DeepResearch: Beyond Search, Toward Real-World Long-Horizon Research Agents

@KaiZhang_CS: Check out one of the best open-source search agents trained by @jianxie_ !! glad to see early experience methods work o…

Mind DeepResearch Technical Report

Researchers trained a Deep Research agent with 32 H100s and open-sourced everything

Submit Feedback

Similar Articles

@Ex0byt: A must bookmark.. tiny cracked team, 4 H100 nodes, open source 3 stage recipe, trained on 8k synthetic rubric tasks, fu…

S1-DeepResearch: Beyond Search, Toward Real-World Long-Horizon Research Agents

@KaiZhang_CS: Check out one of the best open-source search agents trained by @jianxie_ !! glad to see early experience methods work o…

Mind DeepResearch Technical Report

Researchers trained a Deep Research agent with 32 H100s and open-sourced everything
Researchers trained a Deep Research agent using 32 H100 GPUs and open-sourced all components, enabling community access and further development.