Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Hugging Face Daily Papers 05/12/26, 12:00 AM Papers

Summary

This paper proposes Multi-Stream LLMs, which transition from sequential message-based instruction tuning to parallel stream processing. This approach allows language models to simultaneously read, think, and generate across multiple concurrent data flows, addressing bottlenecks in autonomous agent applications.

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.

Original Article

View Cached Full Text

Cached at: 05/13/26, 12:14 PM

Paper page - Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Source: https://huggingface.co/papers/2605.12460

Abstract

Language models can be enhanced by transitioning from sequential message-based instruction-tuning to parallel stream processing, enabling simultaneous reading and generation across multiple concurrent data flows.

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching frominstruction-tuningforsequential message formatstoinstruction-tuningfor multiple,parallel streams of computation, splitting each role into a separate stream. Everyforward passof the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improvesmodel efficiencythrough parallelization, improves model security through betterseparation of concernsand can further improve modelmonitorability.

View arXiv page View PDF Project page GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.12460

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12460 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12460 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12460 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Paper page - Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

@jonasgeiping: We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based e…

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Liberating LLM Capabilities in Full-Duplex Speech Models

Submit Feedback

Similar Articles

Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

@jonasgeiping: We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based e…

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Liberating LLM Capabilities in Full-Duplex Speech Models