strategic-reasoning

#strategic-reasoning

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

arXiv cs.AI ↗ · 2d ago Cached

Introduces Age of LLM, a turn-based 1v1 benchmark where LLMs compete on a grid with fog of war and diplomacy, measuring reasoning, reliability, and strategic planning. Findings show a dominance of nuclear rush tactics and a weak link between reliability and winning.

0 favorites 0 likes

#strategic-reasoning

RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models

arXiv cs.AI ↗ · 2026-06-18 Cached

RTSGameBench is a benchmark for evaluating strategic reasoning in vision-language models using the real-time strategy game Beyond All Reason. It provides diverse matchups, diagnostic mini-games, and a self-evolving framework to generate new scenarios.

0 favorites 0 likes

#strategic-reasoning

Poker Arena: Multi-Axis Profiling of Strategic Reasoning and Memory in LLMs

arXiv cs.CL ↗ · 2026-06-15 Cached

Poker Arena is a new benchmark using no-limit Texas Hold'em to evaluate LLMs' strategic reasoning and memory across multiple cognitive axes. The platform reveals that multi-axis evaluation exposes capability structures that scalar leaderboards misrank.

0 favorites 0 likes

#strategic-reasoning

Shall we play a game? – LLMs use tactical nukes in 95% of simulations

Hacker News Top ↗ · 2026-06-11 Cached

A study testing leading LLMs in simulated nuclear crisis scenarios found that models often escalate to nuclear strikes, with Claude showing cunning strategic deception while GPT-5.2 remained passive. The models generated over 760,000 words of strategic reasoning.

0 favorites 0 likes

#strategic-reasoning

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

Introduces SVI-Bench, a large-scale benchmark for strategic video intelligence using team sports, designed to evaluate models on dynamic scene understanding, causal reasoning, strategic simulation, and agentic synthesis. The benchmark reveals a capability cliff where models perform well on perceptual tasks but sharply degrade on higher-level strategic reasoning.

0 favorites 0 likes

#strategic-reasoning

GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models

arXiv cs.AI ↗ · 2026-05-25 Cached

This paper introduces GENSTRAT, a benchmark that uses procedurally generated strategic environments to evaluate LLMs' strategic reasoning across multiple axes, addressing limitations of fixed game suites.

0 favorites 0 likes

#strategic-reasoning

Evaluating Large Language Models in a Complex Hidden Role Game

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces an open-source framework to evaluate LLMs' reasoning, persuasion, and deception capabilities in the hidden role game Secret Hitler, finding that current models fail at sustained multi-turn manipulation while rule-based agents outperform them.

0 favorites 0 likes

#strategic-reasoning

Counterparty Modeling is Not Strategy: The Limits of LLM Negotiators

arXiv cs.AI ↗ · 2026-05-19 Cached

Study shows LLM agents can model counterparty preferences in negotiation but fail to turn that knowledge into strategic bargaining to improve outcomes, limiting their effectiveness in multi-turn negotiations.

0 favorites 0 likes

#strategic-reasoning

Games people — and machines — play: Untangling strategic reasoning to advance AI

MIT News — Artificial Intelligence ↗ · 2026-05-05 Cached

MIT professor Gabriele Farina is advancing AI decision-making by combining game theory with machine learning, building on his earlier work with the diplomatic AI Cicero.

0 favorites 0 likes

strategic-reasoning

Submit Feedback