Two-Stage Learned Decomposition for Scalable Routing on Multigraphs

arXiv cs.LG 05/08/26, 04:00 AM Papers

Summary

This paper proposes Node-Edge Policy Factorization (NEPF) to address scalability issues in solving Vehicle Routing Problems on multigraphs. It combines pre-encoding edge aggregation with a hierarchical reinforcement learning method to achieve state-of-the-art solution quality with faster training and inference.

arXiv:2605.05389v1 Announce Type: new Abstract: Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). Few methods are designed for such formulations and those that do exist face major scalability issues. We mitigate these scalability issues via a Node-Edge Policy Factorization (NEPF) approach, which splits the routing policy into a node permutation stage and an edge selection stage. To enable the decomposition, we introduce a pre-encoding edge aggregation scheme and a non-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state-of-the-art in terms of solution quality, while being significantly faster in training and inference.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/08/26, 07:14 AM

# Two-Stage Learned Decomposition for Scalable Routing on Multigraphs
Source: [https://arxiv.org/abs/2605.05389](https://arxiv.org/abs/2605.05389)
[View PDF](https://arxiv.org/pdf/2605.05389)

> Abstract:Most neural methods for Vehicle Routing Problems \(VRPs\) are limited to Euclidean settings or simple graphs\. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade\-offs \(e\.g\., distance vs time\)\. Few methods are designed for such formulations and those that do exist face major scalability issues\. We mitigate these scalability issues via a Node\-Edge Policy Factorization \(NEPF\) approach, which splits the routing policy into a node permutation stage and an edge selection stage\. To enable the decomposition, we introduce a pre\-encoding edge aggregation scheme and a non\-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly\. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state\-of\-the\-art in terms of solution quality, while being significantly faster in training and inference\.

## Submission history

From: Filip Rydin \[[view email](https://arxiv.org/show-email/4d4fdb48/2605.05389)\] **\[v1\]**Wed, 6 May 2026 19:23:09 UTC \(68 KB\)

Two-Stage Learned Decomposition for Scalable Routing on Multigraphs

Similar Articles

Near-Future Policy Optimization

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Evolved Policy Gradients

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

Reinforcement Learning via Value Gradient Flow

Submit Feedback

Similar Articles

Near-Future Policy Optimization

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

Reinforcement Learning via Value Gradient Flow