DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Hugging Face Daily Papers 06/09/26, 12:00 AM Papers

code-agents dataset software-engineering long-horizon whole-repository-generation fine-tuning llm

Summary

DeNovoSWE is a large-scale dataset for training code agents to generate entire software repositories from documentation, using a sandboxed agentic workflow and difficulty-aware filtering. Fine-tuning Qwen3-30B-A3B on it boosts performance on the BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with "divide and conquer" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:41 PM

Paper page - DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Source: https://huggingface.co/papers/2606.10728

Abstract

A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks.

As the capabilities ofLLM-based code agentscontinue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiablewhole-repository generationdata. In this paper, we introduce DeNovoSWE, alarge-scale datasetforwhole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designedsandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with “divide and conquer” andcritic-repair philosophy. To balance data quality and diversity, we further introduce adifficulty-aware trajectory filteringstrategy.Fine-tuning Qwen3-30B-A3Bon DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challengingBeyondSWE-Doc2Repo benchmarkfrom 5.8% to 47.2%.

View arXiv page View PDF GitHub27 Add to collection

Get this paper in your agent:

hf papers read 2606\.10728

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.10728 in a model README.md to link it from this page.

Datasets citing this paper3

#### AweAI-Team/DeNovoSWE Preview• Updated1 day ago • 51 • 2 #### AweAI-Team/DeNovoSWE-Trajectory-Filtered Preview• Updated1 day ago • 19 • 4 #### AweAI-Team/DeNovoSWE-Trajectory-Raw Preview• Updated1 day ago • 14 • 2

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.10728 in a Space README.md to link it from this page.

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Paper page - DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Abstract

Models citing this paper0

Datasets citing this paper3

Spaces citing this paper0

Collections including this paper2

Similar Articles

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Someone did an audit on the new DeepSWE, the results aren't pretty

@xdotli: mini-swe-agent is impressive. 100 lines, one bash tool, same prompt for every model tops on DeepSWE by @datacurve where…

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

SWE Context Bench just proved something I think a lot of coding agent users already feel

Submit Feedback

Similar Articles

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Someone did an audit on the new DeepSWE, the results aren't pretty

@xdotli: mini-swe-agent is impressive. 100 lines, one bash tool, same prompt for every model tops on DeepSWE by @datacurve where…

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

SWE Context Bench just proved something I think a lot of coding agent users already feel