The Scaling Properties of Implicit Deductive Reasoning in Transformers

Hugging Face Daily Papers Papers

Summary

This research examines how deep Transformers with bidirectional masking achieve implicit deductive reasoning comparable to explicit chain-of-thought methods. The study demonstrates that algorithmically aligned models can scale reasoning capabilities across diverse graph topologies and problem widths.

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.
Original Article
View Cached Full Text

Cached at: 05/08/26, 02:27 PM

Paper page - The Scaling Properties of Implicit Deductive Reasoning in Transformers

Source: https://huggingface.co/papers/2605.04330 Published on May 5

·

Submitted byhttps://huggingface.co/envomp

Enricoon May 8

Abstract

Deep Transformers with bidirectional masking exhibit implicit deductive reasoning capabilities comparable to explicit chain-of-thought methods across various graph structures and problem sizes.

We investigate the scaling properties ofimplicit deductive reasoningoverHorn clausesindepth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcingalgorithmic alignment, we find that in sufficiently deep models with abidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.04330

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.04330 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.04330 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.04330 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason

arXiv cs.LG

A comprehensive spectral analysis across 11 LLMs revealing that transformers exhibit phase transitions in hidden activation spaces during reasoning versus factual recall, with seven fundamental phenomena including spectral compression, instruction-tuning reversal, and perfect correctness prediction (AUC=1.0) based solely on spectral properties.

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

arXiv cs.CL

Proposes ProxyCoT, a training framework that improves long-context reasoning in large language models by first obtaining chain-of-thought reasoning traces on short proxy contexts (via reinforcement learning or distillation) and then grounding them in full long contexts through supervised fine-tuning. Experiments show consistent improvements over baselines with reduced computational cost.

Transformers Linearly Represent Highly Structured World Models

arXiv cs.LG

This paper demonstrates that transformers trained on Sudoku solving traces build structured world models organized by domain constraints, and identifies a sparse, monosemantic circuit responsible for the naked-single decision rule. The work provides a fully interpretable algorithmic account of transformer reasoning on a combinatorial task.

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

arXiv cs.CL

This paper introduces Deep Reasoning, an inference-time approach that uses structured meta-reasoning to construct task-specific scaffolds for general-purpose agents. The proposed agent, Dolores, outperforms existing methods by distributing cognition across lower-load reasoning threads, reducing hallucinations and improving performance across multiple benchmarks.