Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization

arXiv cs.AI 05/20/26, 04:00 AM Papers

Summary

This paper introduces projection agents for graph combinatorial optimization using reinforcement learning and graph neural networks, operating in a continuous action embedding space to improve generalization and scalability, and releases the LaGCO-RL library.

arXiv:2605.19721v1 Announce Type: new Abstract: Graph combinatorial optimization (GCO) has attracted growing interest, as many NP-hard problems naturally admit graph formulations, yet their combinatorial explosion renders exact methods computationally intractable. Recent advances in Reinforcement Learning (RL) combined with Graph Neural Networks (GNNs) have significantly improved learning-based GCO solvers. However, existing approaches face limitations in both generalization across diverse graph instances and computational scalability as action spaces grow. To address both challenges, we introduce projection agents, a novel RL-GCO approach that operates directly in a continuous GNN-based action embedding space, predicting a desired latent action in a single forward pass and subsequently decoding it into a valid discrete action. Additionally, we enable fair comparison across RL methods through a shared embedding space for both observations and actions. Across diverse benchmarks, our approach achieves up to 16.2x faster inference and up to 40% better generalization than existing solutions using only simple nearest-neighbor decoding, while opening the door to strong RL performance in super-linear decision spaces with multiple interdependent variables. Finally, we release LaGCO-RL, a Python library that automates latent action-space construction and supports existing RL-GCO solutions, promoting reproducibility and adaptation to new GCO benchmarks.

Original Article

Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization

Similar Articles

GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphReAct: Reasoning and Acting for Multi-step Graph Inference

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Submit Feedback

Similar Articles

GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphReAct: Reasoning and Acting for Multi-step Graph Inference

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization