adversarial-uncertainty

Tag

Cards List
#adversarial-uncertainty

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

arXiv cs.AI · 2d ago Cached

Introduces Age of LLM, a turn-based 1v1 benchmark where LLMs compete on a grid with fog of war and diplomacy, measuring reasoning, reliability, and strategic planning. Findings show a dominance of nuclear rush tactics and a weak link between reliability and winning.

0 favorites 0 likes
← Back to home

Submit Feedback