Tag
Introduces HPML, a method that projects the joint update field of multi-agent systems onto a metric-gradient component to stabilize and improve multi-agent reinforcement learning. It provides theoretical guarantees and shows improved stability and returns on CTDE benchmarks.
Proposes MAVIC, a method for multi-agent reinforcement learning that corrects value estimates at instruction boundaries to enable compliance with external natural language instructions while preserving base task performance.