@jinyuhou0: On popular benchmarks, our 30B model matches systems 20-30x its size (gpt-5.4-xhigh, DeepSeek-V3.2, Kimi-K2.5), while u…
Summary
A new 30B model matches systems 20-30x its size on popular benchmarks while using up to 95% fewer reasoning tokens than comparable agentic LLMs, achieved through a learned configurator that decides when and how to reason. Model and code are openly available.
View Cached Full Text
Cached at: 05/24/26, 10:27 AM
On popular benchmarks, our 30B model matches systems 20-30x its size (gpt-5.4-xhigh, DeepSeek-V3.2, Kimi-K2.5), while using up to 95% fewer reasoning tokens than comparable 30/32B agentic LLMs.
The trick: don’t just reason less, reason about the right things. A learned configurator decides when to simulate, how far ahead, and when to skip planning entirely.
Efficient reasoning is an allocation problem, not a compression problem.
Model and code are openly available.
Mingkai Deng (@mdeng34): Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens.
We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we
Similar Articles
@mdeng34: Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT…
New research introduces SR²AM, a configurator that self-regulates when to use simulative reasoning, improving efficiency and performance in LLMs.
@witcheer: can’t believe gpt-oss-20b perfs on 8GB vRAM 21B total params, 3.6B active (MoE). OpenAI, Apache 2.0. uses only 1.8 GB V…
A new open-source MoE model, gpt-oss-20b (21B total, 3.6B active), runs on only 1.8GB VRAM and achieves perfect scores on agentic coding tasks, outperforming other local models like Gemma and Qwen.
Liquid AI reveals 8B-A1B MoE trained on 38T
Liquid AI released LFM2.5-8B-A1B, an edge MoE model trained on 38T tokens with a 128K context window, improved tool calling, and reasoning capabilities, available on Hugging Face.
Microsoft's new MAI models
Microsoft announced two new LLMs: MAI-Thinking-1 (35B reasoning model) and MAI-Code-1-Flash (5B code model), both trained on enterprise-grade, clean data without third-party distillation, with MAI-Thinking-1 claimed to be preferred over Sonnet 4.6 in blind evaluations.
@berryxia: Small model, big wisdom? It's now real! A 7B small model now acts as the boss of top large models like GPT-5, Claude Sonnet 4, Gemini 2.5 Pro. A new paper shows an RL-trained 7B model learned to write natural language subtasks, assign them to different models, precisely...
A new paper proposes training a 7B small model via reinforcement learning as a task scheduler, automatically decomposing subtasks and assigning them to top models like GPT-5 and Claude. It surpasses individual frontier models on several hard benchmarks, demonstrating that end-to-end reward learning can effectively replace manual prompt engineering and multi-agent pipeline design.