Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization
Summary
This paper introduces projection agents for graph combinatorial optimization using reinforcement learning and graph neural networks, operating in a continuous action embedding space to improve generalization and scalability, and releases the LaGCO-RL library.
Similar Articles
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero
GRLO introduces a novel reinforcement learning post-training method that achieves strong generalization across multiple domains (math, code, etc.) from only 5K prompts and 22.7 GPU hours, significantly outperforming in-domain RLVR baselines in efficiency and data requirements.
GraphPO: Graph-based Policy Optimization for Reasoning Models
GraphPO is a novel graph-based reinforcement learning framework that represents rollouts as a directed acyclic graph, merging semantically equivalent reasoning paths to reduce redundant exploration and improve credit assignment for large reasoning models.
GraphReAct: Reasoning and Acting for Multi-step Graph Inference
This paper introduces GraphReAct, a framework that extends reasoning-acting paradigms to graph-structured data for multi-step inference. It combines topological and semantic retrieval with context refinement to improve performance on graph learning benchmarks.
ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents
ExpGraph is a model-agnostic framework that enables LLM agents to reuse past experiences via a self-evolving graph of skills and failures, improving task performance by 12–21% without retraining the executor.
Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization
This paper proposes an exploration-aware reinforcement learning framework that enables LLM agents to adaptively explore only when uncertainty is high, improving performance on text-based and GUI-based benchmarks.