value-correction

#value-correction

Macro-Action Based Multi-Agent Instruction Following through Value Cancellation

arXiv cs.AI ↗ · 2026-05-14 Cached

Proposes MAVIC, a method for multi-agent reinforcement learning that corrects value estimates at instruction boundaries to enable compliance with external natural language instructions while preserving base task performance.

0 favorites 0 likes

value-correction

Macro-Action Based Multi-Agent Instruction Following through Value Cancellation

Submit Feedback