@stevibe: Which LLMs actually love to think? Tested 7 models on 5 math problems, measured reasoning length. The think winners: bo…
Summary
Benchmarked 7 LLMs on 5 math problems; Qwen3.5 27B and 35B A3B generated the longest reasoning chains, exceeding 10k tokens per question.
View Cached Full Text
Cached at: 04/21/26, 07:25 PM
Which LLMs actually love to think? Tested 7 models on 5 math problems, measured reasoning length. The think winners: both Qwen3.5 models (27B and 35B A3B) — massive overthinkers, up to 10k+ tokens on a single question. Plot twists: > Kimi K2.6 feels verbose, actually one of
Similar Articles
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem
This empirical study evaluates LLMs on the Equivalence Class Problem to assess long-chain reasoning capabilities, finding that non-reasoning models fail while reasoning models struggle with specific structural difficulties.
Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms
This paper investigates how large language models perform arithmetic operations by analyzing internal mechanisms through early decoding, revealing that proficient models exhibit a clear division of labor between attention and MLP modules in reasoning tasks.
Less Is More: Cognitive Load and the Single-Prompt Ceiling in LLM Mathematical Reasoning
Empirical study on LLM formal-math reasoning finds a single-prompt ceiling: accuracy plateaus around 60–79% regardless of prompt size, driven by undecidability, model fragility, and distribution mismatch.
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning
Study reveals that answer tokens in thinking LLMs follow a structured self-reading pattern—forward drift plus focus on key anchors—during quantitative reasoning, and proposes a training-free SRQ steering method to exploit this for accuracy gains.
Learning to reason with LLMs
OpenAI publishes an article exploring reasoning techniques with LLMs through cipher-decoding examples, demonstrating step-by-step problem-solving approaches and pattern recognition in language models.