Tag
Introduces delayed per-step reward attribution with eligibility gating for reinforcement learning in multi-agent language model interactions, achieving first place in the MindGames Arena benchmark at NeurIPS 2025.