Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

Summary

Introduces Atomic Decomposition and Recombination (ADR), a framework that generates novel and challenging verifiable code tasks by decomposing and recombining atomic elements, enabling scalable reinforcement learning with verifiable rewards for large language models.

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.

Original Article

View Cached Full Text

Cached at: 06/05/26, 06:07 AM

Paper page - Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Source: https://huggingface.co/papers/2605.31058

Abstract

Atomic Decomposition and Recombination (ADR) framework generates novel and challenging verifiable code tasks for scalable reinforcement learning with verifiable rewards in large language models.

Reinforcement Learning with Verifiable Rewards(RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities ofLarge Language Models(LLMs). However, thescalabilityof RLVR is severely constrained by the scarcity of sufficiently challengingverifiable code tasksthat target near the model’s edge of competence. Prior studies often rely onheuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we proposeAtomic DecompositionandRecombination(ADR), a novel framework that generatesverifiable code tasksvia decomposition into atomic elements and controlledrecombination, thereby enabling the generation of genuinely novel and challengingverifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, includingalgorithmic programming,tool usage, anddata science. Our work sheds light on a new paradigm for novelcode task synthesisand scalable RLVR training.

View arXiv page View PDF GitHub3 Add to collection

Get this paper in your agent:

hf papers read 2605\.31058

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.31058 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.31058 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.31058 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Paper page - Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

@adithya_s_k: https://x.com/adithya_s_k/status/2054961319179420035

CodeAlchemy: Synthetic Code Rewriting at Scale

AgentV-RL: Scaling Reward Modeling with Agentic Verifier

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Submit Feedback

Similar Articles

@adithya_s_k: https://x.com/adithya_s_k/status/2054961319179420035

CodeAlchemy: Synthetic Code Rewriting at Scale

AgentV-RL: Scaling Reward Modeling with Agentic Verifier

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR