See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Summary
This paper presents a method for dense latent communication between heterogeneous multi-agent systems using aligned KV-cache transformation, achieving better performance than text-based methods with lower computational costs.
View Cached Full Text
Cached at: 06/12/26, 06:54 PM
Paper page - See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Source: https://huggingface.co/papers/2606.13594
Abstract
Heterogeneous multi-agent systems can effectively transfer knowledge through aligned KV-cache communication, achieving better performance than text-based methods with reduced computational costs.
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost.KV-cache communicationis a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge ofcross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: canheterogeneous agentsbe aligned well enough to perform real “mind reading” and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality:context-aware transferis driven by sparse reasoning signals, whilecontext-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we proposedense alignmentfor heterogeneousKV-cache communicationvia a lightweightcross-model cache transformationandtwo-phase training:reconstructionfollowed bygeneration. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective incontext-unaware transferwhere prior methods collapse.
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2606\.13594
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.13594 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.13594 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.13594 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Latent Cache Flow: Model-to-Model Communication Without Text
The paper introduces Latent Cache Flow (LCF), a method for efficient model-to-model communication by exchanging compressed KV caches instead of text, reducing adapter size and enabling cross-context communication.
Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems
This paper presents a unified framework for latent communication in LLM-based multi-agent systems, categorizing methods by what information is communicated, sender-receiver alignment, and fusion technique, and reviews eighteen representative methods from 2024-2026.
Stateful Inference for Low-Latency Multi-Agent Tool Calling
This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
This paper introduces PACT, a method for structuring agent-to-agent communication in multi-agent LLM systems that uses compact action-state records to reduce token consumption while maintaining or improving task performance, with demonstrated gains on SWE-agent and OpenHands.
Enabling KV Caching of Shared Prefix for Diffusion Language Models
This paper proposes BiCache, a novel KV caching technique for shared prefixes in diffusion language models, which avoids accuracy collapse by dynamically reusing cached keys and values in shallow layers and achieves 36.3%–98.3% throughput improvement.