Tag
This paper introduces a framework for two-sided matching with temporally extended feedback, formulating it as a partially observable Markov game with costly screening, noisy observations, and evolving latent profiles. The authors present Learn2Match, a multi-agent reinforcement learning benchmark, and show that independent PPO outperforms bandit baselines in social welfare but incurs higher information-friction loss.