Two-Stage Learned Decomposition for Scalable Routing on Multigraphs

arXiv cs.LG Papers

Summary

This paper proposes Node-Edge Policy Factorization (NEPF) to address scalability issues in solving Vehicle Routing Problems on multigraphs. It combines pre-encoding edge aggregation with a hierarchical reinforcement learning method to achieve state-of-the-art solution quality with faster training and inference.

arXiv:2605.05389v1 Announce Type: new Abstract: Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). Few methods are designed for such formulations and those that do exist face major scalability issues. We mitigate these scalability issues via a Node-Edge Policy Factorization (NEPF) approach, which splits the routing policy into a node permutation stage and an edge selection stage. To enable the decomposition, we introduce a pre-encoding edge aggregation scheme and a non-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state-of-the-art in terms of solution quality, while being significantly faster in training and inference.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/08/26, 07:14 AM

# Two-Stage Learned Decomposition for Scalable Routing on Multigraphs
Source: [https://arxiv.org/abs/2605.05389](https://arxiv.org/abs/2605.05389)
[View PDF](https://arxiv.org/pdf/2605.05389)

> Abstract:Most neural methods for Vehicle Routing Problems \(VRPs\) are limited to Euclidean settings or simple graphs\. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade\-offs \(e\.g\., distance vs time\)\. Few methods are designed for such formulations and those that do exist face major scalability issues\. We mitigate these scalability issues via a Node\-Edge Policy Factorization \(NEPF\) approach, which splits the routing policy into a node permutation stage and an edge selection stage\. To enable the decomposition, we introduce a pre\-encoding edge aggregation scheme and a non\-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly\. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state\-of\-the\-art in terms of solution quality, while being significantly faster in training and inference\.

## Submission history

From: Filip Rydin \[[view email](https://arxiv.org/show-email/4d4fdb48/2605.05389)\] **\[v1\]**Wed, 6 May 2026 19:23:09 UTC \(68 KB\)

Similar Articles

Near-Future Policy Optimization

Hugging Face Daily Papers

Proposes Near-Future Policy Optimization (NPO), a mixed-policy RL method that accelerates convergence by learning from a later checkpoint of the same training run, boosting Qwen3-VL-8B-Instruct performance from 57.88 to 62.84.

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Hugging Face Daily Papers

RAD-2 presents a unified generator-discriminator framework for autonomous driving that combines diffusion-based trajectory generation with RL-optimized reranking, achieving 56% collision rate reduction compared to diffusion-based planners. The approach introduces techniques like Temporally Consistent Group Relative Policy Optimization and BEV-Warp simulation environment for efficient large-scale training.

Evolved Policy Gradients

OpenAI Blog

OpenAI introduces Evolved Policy Gradients (EPG), a meta-learning approach that learns loss functions through evolution rather than learning policies directly, enabling RL agents to generalize better across tasks by leveraging prior experience similar to how humans transfer skills.

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

arXiv cs.LG

The paper introduces an information-theoretic framework for communication-efficient expert routing in sparse mixture-of-experts models, treating the gate as a stochastic channel and deriving practical mutual information estimators to analyze accuracy-rate tradeoffs over finite expert banks.

Reinforcement Learning via Value Gradient Flow

Hugging Face Daily Papers

Value Gradient Flow (VGF) presents a scalable approach to behavior-regularized reinforcement learning by formulating it as an optimal transport problem solved through discrete gradient flow, achieving state-of-the-art results on offline RL and LLM RL benchmarks. The method eliminates explicit policy parameterization while enabling adaptive test-time scaling by controlling transport budget.