Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
Summary
A new hybrid decoding framework called In-Writing is proposed, which delays constraint application until after a trigger token, combining free-form reasoning with structured generation for improved accuracy in classification and reasoning tasks.
View Cached Full Text
Cached at: 05/29/26, 11:01 AM
Paper page - Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
Source: https://huggingface.co/papers/2601.07525
Abstract
A hybrid approach called In-Writing is proposed that combines free-form reasoning with structured generation by delaying constraint application until after a trigger token is generated, improving accuracy in classification and reasoning tasks.
Natural generation allowsLarge Language Models(LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely,constrained decodingensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combinesfree-form reasoningandstructured generationin a single call. The model first performs unconstrained reasoning and only applies structured decoding after atrigger tokenis generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicatepremature triggering, a failure mode in whichconstrained decodinginterrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.
View arXiv pageView PDFGitHub0Add to collection
Get this paper in your agent:
hf papers read 2601\.07525
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2601.07525 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2601.07525 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2601.07525 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Enhanced and Efficient Reasoning in Large Learning Models
This paper proposes a method for improving reasoning in large language models by recoding data to explicitly represent relationships, enabling efficient principled reasoning with polynomial-time learnability for relational rules, which addresses hallucinations and supports sound reasoning across multiple calls.
COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models
COFT is a training-free decoding method that applies token-level fairness control and conformal calibration to reduce bias in chain-of-thought reasoning of large language models, achieving 30-55% bias reduction with minimal computational overhead.
Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms
This paper investigates how large language models perform arithmetic operations by analyzing internal mechanisms through early decoding, revealing that proficient models exhibit a clear division of labor between attention and MLP modules in reasoning tasks.
Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models
This paper introduces a white-box diagnostic framework that localizes instruction hierarchy failures in reasoning language models into identification, conflict resolution, and response realization stages. It evaluates several models and proposes two training-free self-monitoring mechanisms that reduce non-compliance by 81–99%.
Learning to reason with LLMs
OpenAI publishes an article exploring reasoning techniques with LLMs through cipher-decoding examples, demonstrating step-by-step problem-solving approaches and pattern recognition in language models.