Tag
This paper introduces a framework for two-sided matching with temporally extended feedback, formulating it as a partially observable Markov game with costly screening, noisy observations, and evolving latent profiles. The authors present Learn2Match, a multi-agent reinforcement learning benchmark, and show that independent PPO outperforms bandit baselines in social welfare but incurs higher information-friction loss.
This paper presents a distributed approach for constrained multi-agent reinforcement learning that uses state-augmented policy learning and neighbor-to-neighbor consensus over dual variables to satisfy global resource constraints while scaling linearly with the number of agents. Experiments on smart grid demand response demonstrate that consensus coordination is essential for feasibility, scaling to thousands of agents unlike centralized training approaches.
This paper introduces Differentiable Belief-based Opponent Shaping (D-BOS), a first-order method that treats observer beliefs as the shaped state and differentiates through belief update dynamics, allowing optimal strategies to emerge naturally from the environment's reward structure in hidden-role multi-agent settings.
This paper proposes a multi-agent reinforcement learning framework that co-trains an autonomous vehicle and pedestrians with personality-driven jaywalking behavior, achieving a 30% reduction in collisions compared to single-agent approaches and demonstrating more realistic interaction scenarios.
This paper introduces SLIM, a minimal architecture that decouples communication from policy representation in multi-agent reinforcement learning, achieving state-of-the-art performance under bandwidth constraints with minimal degradation.
This paper presents empirical evidence that quantum entanglement provides a measurable advantage in multi-agent reinforcement learning, using the CHSH game and cooperative navigation tasks to demonstrate performance improvements over classical baselines.
The paper introduces Diamond Attention, a method for multi-agent reinforcement learning that uses structured randomness to break symmetry and enable role differentiation among homogeneous agents, achieving perfect coordination in symmetric tasks like the XOR game.