@Ex0byt: A must bookmark.. tiny cracked team, 4 H100 nodes, open source 3 stage recipe, trained on 8k synthetic rubric tasks, fu…
Summary
A small team trained a frontier-level Deep Research Agent on an academic budget using only 32 H100s and 8K synthetic samples, releasing fully open weights, code, and paper for models from 2B to 35B that match or beat closed frontier agents on key benchmarks.
View Cached Full Text
Cached at: 06/18/26, 02:17 PM
A must bookmark.. tiny cracked team, 4 H100 nodes, open source 3 stage recipe, trained on 8k synthetic rubric tasks, fully open weights, code and paper; out comes a 35B model that matches or beats closed frontier deep research agents on key benchmarks. the traces this model produces is gold.
Yu Su (@ysu_nlp): We trained a ~frontier Deep Research Agent on academic budget
> 32 H100s > 8K synthetic samples > fully open training infra + recipe (SFT, mid-training, RL) > models of diff sizes (2B -> 35B) ready to use out of the box
This is yet another demonstration of how the frontier of
Similar Articles
@KaiZhang_CS: Check out one of the best open-source search agents trained by @jianxie_ !! glad to see early experience methods work o…
Yu Su's team trained a frontier Deep Research Agent on an academic budget using 8K synthetic samples and RL, releasing fully open training infrastructure and models from 2B to 35B parameters.
Researchers trained a Deep Research agent with 32 H100s and open-sourced everything
Researchers trained a Deep Research agent using 32 H100 GPUs and open-sourced all components, enabling community access and further development.
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.
@Apodex_AI: Dive in Blog: https://apodex.com/blog/apodex-1.0 Tech report: http://apodex.com/pdf/20260608 Github: https://github.com…
ApodexAI releases Apodex-1.0, a deep-research model that operates as a tool-using ReAct agent. Its heavy-duty mode, Apodex-1.0-H, uses an asynchronous agent team with up to 150 sub-agents and achieves new state-of-the-art results on deep-research benchmarks including BrowseComp, DeepSearchQA, HLE, and FrontierScience, surpassing models like GPT-5.5-pro and Claude-Opus-4.8.
@heyshrutimishra: Apodex 1.0 dropped and the architecture is genuinely different. It's post-trained on Qwen3.5 as a self-evolving system:…
Apodex 1.0 is a self-evolving AI system post-trained on Qwen3.5, achieving SOTA on BrowseComp, DeepSearchQA, and HLE-text. Its 4B mini model outperforms 30B-class models, with an AgentOS runtime for task orchestration. Open weights available.