Tag
Introduces Age of LLM, a turn-based 1v1 benchmark where LLMs compete on a grid with fog of war and diplomacy, measuring reasoning, reliability, and strategic planning. Findings show a dominance of nuclear rush tactics and a weak link between reliability and winning.