Tag
The paper introduces GPTNT, a benchmark built on Keep Talking and Nobody Explodes that requires two multimodal agents to collaborate in real-time under time pressure and information asymmetry, revealing critical weaknesses in state-of-the-art systems.