Tag
Evalatro is an open benchmark where LLMs play the real game Balatro via a text-based interface, with fixed seeds, a public leaderboard, and the goal of clearing Ante 12. Early results show models struggle, with none reaching the target.