Latent Reasoning with Normalizing Flows
Summary
Proposes NF-CoT, a latent reasoning framework using normalizing flows to model continuous thoughts in LLMs, preserving autoregressive advantages and achieving better code generation performance with lower cost.
View Cached Full Text
Cached at: 06/05/26, 06:07 AM
Paper page - Latent Reasoning with Normalizing Flows
Source: https://huggingface.co/papers/2606.06447
Abstract
Latent reasoning framework using normalizing flows preserves autoregressive generation advantages while enabling efficient, probabilistic intermediate computation in large language models.
Large language models often improve reasoning by generating explicitchain-of-thought(CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed.Latent reasoningoffers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation,probabilistic sampling, compatibility withKV-cache decoding, and tractablelikelihood estimation. We propose NF-CoT, alatent reasoningframework that preserves these advantages by modeling continuous thoughts withnormalizing flows. NF-CoT instantiates aTARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports directpolicy-gradient optimizationin thelatent reasoningspace. Oncode-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2606\.06447
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.06447 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.06447 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.06447 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
This paper identifies a 'concept bottleneck' in the CoCoNuT latent reasoning paradigm where hidden states are overwritten across passes, and proposes AGCLR, which adds a gated persistent memory stream to retain intermediate facts. Evaluations on GSM8K, HotpotQA, and ProsQA using GPT-2 show consistent improvements, especially on multi-hop tasks.
ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces
Introduces ReasoningFlow, a framework to capture discourse structures of large language model reasoning traces as directed acyclic graphs, enabling fine-grained analysis of reasoning behaviors like self-reflection and backtracking. Based on manual and automatic annotation of thousands of traces, it reveals structural similarities across models and that most erroneous steps do not contribute to final answers.
Adaptive Latent Agentic Reasoning
This paper introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework for LLM agents that uses compact latent reasoning for routine turns and selectively escalates to explicit chain-of-thought for harder decisions, achieving up to 84.6% token reduction while maintaining task accuracy.
Tools as Continuous Flow for Evolving Agentic Reasoning
This paper introduces FlowAgent, a novel framework that reconceptualizes tool chaining as continuous trajectory generation using conditional flow matching to improve robustness in long-horizon agentic reasoning.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning
This paper introduces NoisyCoconut, an inference-time method that improves LLM reliability by injecting noise into latent trajectories to generate diverse reasoning paths. The approach enables models to abstain when uncertain, significantly reducing error rates in mathematical reasoning tasks without requiring retraining.