When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

Hugging Face Daily Papers 05/06/26, 12:00 AM Papers

Summary

This paper introduces Side-by-Side Interleaved Reasoning, a method for controlling disclosure timing in autoregressive models to improve accuracy and efficiency. It demonstrates improved performance on benchmarks using Qwen3 models by interleaving private reasoning with partial disclosures.

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments that bias subsequent generations. We introduce Side-by-Side (SxS) Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation. SxS interleaves partial disclosures with continued private reasoning in the same context, but releases content only when it is supported by the reasoning so far. To learn such pacing without incentivizing filler, we construct entailment-aligned interleaved trajectories by matching answer prefixes to supporting reasoning prefixes, then train with SFT to acquire the dual-action semantics and RL to recover reasoning performance under the new format. Across two Qwen3 architectures/scales (MoE Qwen3-30B-A3B, dense Qwen3-4B) and both in-domain (AIME25) and out-of-domain (GPQA-Diamond) benchmarks, SxS improves accuracy--content-latency Pareto trade-offs under token-level proxies such as inter-update waiting.

Original Article

View Cached Full Text

Cached at: 05/08/26, 08:00 AM

Paper page - When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

Source: https://huggingface.co/papers/2605.03314

Abstract

Side-by-Side Interleaved Reasoning enables controlled disclosure timing in autoregressive models, improving accuracy and efficiency through interleaved private reasoning and delayed content release.

In single-streamautoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates asilence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments that bias subsequent generations. We introduce Side-by-Side (SxS)Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation. SxS interleaves partial disclosures with continuedprivate reasoningin the same context, but releases content only when it is supported by the reasoning so far. To learn such pacing without incentivizing filler, we construct entailment-aligned interleaved trajectories by matching answer prefixes to supporting reasoning prefixes, then train with SFT to acquire the dual-action semantics and RL to recover reasoning performance under the new format. Across two Qwen3 architectures/scales (MoEQwen3-30B-A3B, dense Qwen3-4B) and both in-domain (AIME25) and out-of-domain (GPQA-Diamond) benchmarks, SxS improves accuracy--content-latency Pareto trade-offs under token-level proxies such as inter-update waiting.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.03314

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.03314 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.03314 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.03314 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

Paper page - When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

When to Think Deeply: Inhibitory Deliberation for LLM Reasoning

Multi-Turn Reasoning When Context Arrives in Pieces: Scalable Sharding and Memory-Augmented RL

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Learning to Refine Hidden States for Reliable LLM Reasoning

Submit Feedback

Similar Articles

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

When to Think Deeply: Inhibitory Deliberation for LLM Reasoning

Multi-Turn Reasoning When Context Arrives in Pieces: Scalable Sharding and Memory-Augmented RL

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Learning to Refine Hidden States for Reliable LLM Reasoning