Two-Stage Learned Decomposition for Scalable Routing on Multigraphs
Summary
This paper proposes Node-Edge Policy Factorization (NEPF) to address scalability issues in solving Vehicle Routing Problems on multigraphs. It combines pre-encoding edge aggregation with a hierarchical reinforcement learning method to achieve state-of-the-art solution quality with faster training and inference.
View Cached Full Text
Cached at: 05/08/26, 07:14 AM
# Two-Stage Learned Decomposition for Scalable Routing on Multigraphs Source: [https://arxiv.org/abs/2605.05389](https://arxiv.org/abs/2605.05389) [View PDF](https://arxiv.org/pdf/2605.05389) > Abstract:Most neural methods for Vehicle Routing Problems \(VRPs\) are limited to Euclidean settings or simple graphs\. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade\-offs \(e\.g\., distance vs time\)\. Few methods are designed for such formulations and those that do exist face major scalability issues\. We mitigate these scalability issues via a Node\-Edge Policy Factorization \(NEPF\) approach, which splits the routing policy into a node permutation stage and an edge selection stage\. To enable the decomposition, we introduce a pre\-encoding edge aggregation scheme and a non\-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly\. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state\-of\-the\-art in terms of solution quality, while being significantly faster in training and inference\. ## Submission history From: Filip Rydin \[[view email](https://arxiv.org/show-email/4d4fdb48/2605.05389)\] **\[v1\]**Wed, 6 May 2026 19:23:09 UTC \(68 KB\)
Similar Articles
A Unified Knowledge Embedded Reinforcement Learning-based Framework for Generalized Capacitated Vehicle Routing Problems
This paper proposes a unified knowledge-embedded reinforcement learning framework for generalized capacitated vehicle routing problems, combining route-first cluster-second heuristics with dynamic programming to achieve superior solution quality and strong generalization across diverse variants.
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
This paper introduces GraphDC, a divide-and-conquer multi-agent framework that decomposes graph algorithmic tasks into subgraphs for specialized agents, improving scalability and reasoning performance on complex graph structures.
Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO
This paper identifies surrogate hacking and temporal uncertainty as failure modes in multi-timescale RL, and proposes a Target Decoupling architecture that removes routing from the actor, using the critic for auxiliary representation learning. The method eliminates policy collapse on the LunarLander-v2 benchmark and stably surpasses the 'Environment Solved' threshold without hyperparameter hacking.
GraphPO: Graph-based Policy Optimization for Reasoning Models
GraphPO is a novel graph-based reinforcement learning framework that represents rollouts as a directed acyclic graph, merging semantically equivalent reasoning paths to reduce redundant exploration and improve credit assignment for large reasoning models.
COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space
COAgents is a cooperative multi-agent framework for solving Vehicle Routing Problems that models search as a graph, using specialized agents for node selection, move selection, and jumps to escape local minima. It achieves state-of-the-art results on CVRP and VRPTW benchmarks, reducing the gap to best-known solutions by up to 44% compared to prior learning-based methods.