@mdeng34: Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT…

X AI KOLs Timeline 05/22/26, 03:28 PM Papers

reasoning llm adaptive-reasoning simulative-reasoning system-ii research efficient-reasoning

Summary

New research introduces SR²AM, a configurator that self-regulates when to use simulative reasoning, improving efficiency and performance in LLMs.

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

Original Article

View Cached Full Text

Cached at: 05/23/26, 08:15 PM

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens.

We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt?

Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure.

In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely.

Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

How does self-regulated simulative reasoning perform in practice?

SR²AM-v0.1-8B achieves results competitive with GPT-OSS (120B) and GLM-4.6 (355B).

SR²AM-v1.0-30B is competitive with DeepSeek-V3.2 (685B) and Kimi-K2.5 (1T) at 𝟮𝟲–𝟵𝟱% fewer reasoning tokens than comparable 30/32B agentic LLMs.

The key finding from RL training: the model learns to plan further ahead (+22.8% horizon) rather than more often (+2% frequency). Allocation, not compression.

This is a prototype using language-based world models. Stay tuned for our next steps on multimodal and physical world models.

The concept of a configurator, which decides when and how deeply to engage a reasoning process, is not specific to planning, but extensible to learning and adaptation going forward.

SR²AM: https://arxiv.org/abs/2605.22138 SiRA: https://arxiv.org/abs/2507.23773 Project: https://sailing-lab.github.io/sr2am-self-regulated-planning… Code: https://github.com/sailing-lab/sr2am…

SR²AM-v0.1-8B: https://huggingface.co/sailing-lab/SR2AM-v0.1-8B… SR²AM-v1.0-30B: https://huggingface.co/sailing-lab/SR2AM-v1.0-30B…

Joint work with @jinyuhou0, @larasnevess, @varad0309, @tw_killian, @waterluffy, @ericxing

Exactly! Thank you for the apt summary

Our work is grounded in reinforcement learning (Sutton & Barto), but propose new insights about the meaning and structure of decision making

@mdeng34: Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT…

Similar Articles

@pallavishekhar_: Large Reasoning Models (LRMs) Read here: https://outcomeschool.com/blog/large-reasoning-models…

Learning to reason with LLMs

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…

Submit Feedback

Similar Articles

@pallavishekhar_: Large Reasoning Models (LRMs) Read here: https://outcomeschool.com/blog/large-reasoning-models…

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…