Metric-Gradient Projection for Stable Multi-Agent Policy Learning

arXiv cs.LG 05/20/26, 04:00 AM Papers

multi-agent reinforcement-learning gradient-projection hodge-projection stability marl metric-gradient

Summary

Introduces HPML, a method that projects the joint update field of multi-agent systems onto a metric-gradient component to stabilize and improve multi-agent reinforcement learning. It provides theoretical guarantees and shows improved stability and returns on CTDE benchmarks.

arXiv:2605.18809v1 Announce Type: new Abstract: General-sum multi-agent learning is often governed by a stacked update field in which each agent's policy update changes the optimization landscape faced by the others. This coupling can entangle an integrable component of collective improvement with cyclic interaction dynamics, leading to slow or unstable multi-agent learning. Existing approaches, such as regularization, credit assignment, and consensus methods, stabilize MARL through local or algorithmic modifications; HPML complements them by projecting the joint update field onto a metric-gradient component. We introduce \textbf{HPML} (\textbf{H}odge-\textbf{P}rojected \textbf{M}ulti-agent \textbf{L}earning), which views the joint update field of a multi-agent system as an element of an $L^2$ space of vector fields and computes a Hodge-type projection onto the closest metric-gradient potential flow. HPML follows the projected component as the update direction, yielding the closest metric-gradient field under the chosen metric and sampling measure. The projection is defined variationally, characterized by a Poisson-type equation, and implemented through graph-based and amortized neural realizations that recover projected directions from samples. We show that the projected dynamics admit a Lyapunov potential and yield equilibrium-gap bounds with an explicit additive non-potentiality term. Controlled experiments validate the geometric mechanism, and CTDE benchmarks show improved stability and normalized return when HPML is used as a plug-in projection layer in MARL pipelines.

Original Article

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

Similar Articles

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

Gradient Extrapolation-Based Policy Optimization

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

Hybrid Policy Distillation for LLMs

Hölder Policy Optimisation

Submit Feedback

Similar Articles

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

Gradient Extrapolation-Based Policy Optimization

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

Hybrid Policy Distillation for LLMs