overthinking

#overthinking

CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models

arXiv cs.CL ↗ · yesterday Cached

CAT introduces a framework that leverages model self-certainty signals to autonomously adjust reasoning length based on problem difficulty, reducing overthinking and improving inference efficiency for large reasoning models.

0 favorites 0 likes

#overthinking

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking

arXiv cs.CL ↗ · yesterday Cached

This paper introduces DASH, a method that uses intermediate answer commitments within reasoning traces to assign segment-level credit, reducing overthinking behaviors and improving accuracy on competition-level math benchmarks.

0 favorites 0 likes

#overthinking

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

Reddit r/LocalLLaMA ↗ · 2026-06-22

A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.

0 favorites 0 likes

#overthinking

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper introduces Dynamic Rollout Editing (DRE), a training-time intervention to reduce overthinking in GRPO-style reinforcement learning for reasoning models. DRE edits successful trajectories by preserving the solution-reachable prefix and preferring verified shorter edits, weakening the preference for unnecessary thinking.

0 favorites 0 likes

#overthinking

@sheriyuo: This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for re…

X AI KOLs Timeline ↗ · 2026-06-16 Cached

ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.

0 favorites 0 likes

#overthinking

DyCon: Dynamic Reasoning Control via Evolving Difficulty Modeling

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper introduces DyCon, a training-free framework that uses step-level embeddings to model evolving task difficulty and dynamically control reasoning depth in Large Reasoning Models, effectively reducing overthinking and improving efficiency without sacrificing accuracy.

0 favorites 0 likes

#overthinking

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models

arXiv cs.AI ↗ · 2026-06-03 Cached

This paper introduces a prefix-level trajectory evaluation protocol to distinguish harmful overthinking from verbose but harmless overthinking in large reasoning models, showing that continued reasoning after reaching the correct answer can destabilize performance. The authors find that early stopping improves accuracy by up to 21% on multimodal benchmarks, and identify logical drift and visual reinterpretation as key causes of correctness deviations.

0 favorites 0 likes

#overthinking

Quantized Reasoning Models Think They Need to Think Longer, but They Do Not

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper reveals that aggressive post-training quantization of reasoning models leads to increased overthinking errors, where models reach correct intermediate answers but fail to finalize them. A simple logit penalty on overthinking markers reduces chain-of-thought length by 12-23% while improving accuracy, especially for quantized models.

0 favorites 0 likes

#overthinking

SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

arXiv cs.AI ↗ · 2026-06-01 Cached

SLAT is a segment-level adaptive trimming framework for chain-of-thought reasoning that reduces reasoning length by 50% while maintaining accuracy by suppressing redundant segments.

0 favorites 0 likes

#overthinking

I trained TIME: short context-triggered thinking on Qwen model instead of overthinking

Reddit r/LocalLLaMA ↗ · 2026-05-18

A personal project led to an ACL 2026 paper introducing TIME, a method training Qwen3 models to engage in short, context-triggered thinking rather than excessive reasoning. The work uses QLoRA and a four-phase curriculum, with all data and code released open-source.

0 favorites 0 likes

#overthinking

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

arXiv cs.LG ↗ · 2026-05-13 Cached

This paper introduces the VPG-EA framework, which uses variational inference and posterior guidance to improve the reasoning efficiency of large language models by addressing the 'overthinking' phenomenon in chain-of-thought generation.

0 favorites 0 likes

#overthinking

Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces Implicit Compression Regularization (ICR), a method to address LLM overthinking during RL post-training by guiding models toward concise yet accurate reasoning trajectories.

0 favorites 0 likes

overthinking

Submit Feedback