Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Hugging Face Daily Papers 05/28/26, 12:00 AM Papers

llm decoding reasoning constrained-decoding structured-generation trigger-token

Summary

A new hybrid decoding framework called In-Writing is proposed, which delays constraint application until after a trigger token, combining free-form reasoning with structured generation for improved accuracy in classification and reasoning tasks.

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.

Original Article

View Cached Full Text

Cached at: 05/29/26, 11:01 AM

Paper page - Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Source: https://huggingface.co/papers/2601.07525

Abstract

A hybrid approach called In-Writing is proposed that combines free-form reasoning with structured generation by delaying constraint application until after a trigger token is generated, improving accuracy in classification and reasoning tasks.

Natural generation allowsLarge Language Models(LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely,constrained decodingensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combinesfree-form reasoningandstructured generationin a single call. The model first performs unconstrained reasoning and only applies structured decoding after atrigger tokenis generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicatepremature triggering, a failure mode in whichconstrained decodinginterrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2601\.07525

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2601.07525 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2601.07525 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2601.07525 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Paper page - Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Enhanced and Efficient Reasoning in Large Learning Models

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms

Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models

Learning to reason with LLMs

Submit Feedback

Similar Articles

Enhanced and Efficient Reasoning in Large Learning Models

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms

Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models