See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Hugging Face Daily Papers Papers

Summary

This paper presents a method for dense latent communication between heterogeneous multi-agent systems using aligned KV-cache transformation, achieving better performance than text-based methods with lower computational costs.

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.
Original Article
View Cached Full Text

Cached at: 06/12/26, 06:54 PM

Paper page - See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Source: https://huggingface.co/papers/2606.13594

Abstract

Heterogeneous multi-agent systems can effectively transfer knowledge through aligned KV-cache communication, achieving better performance than text-based methods with reduced computational costs.

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost.KV-cache communicationis a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge ofcross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: canheterogeneous agentsbe aligned well enough to perform real “mind reading” and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality:context-aware transferis driven by sparse reasoning signals, whilecontext-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we proposedense alignmentfor heterogeneousKV-cache communicationvia a lightweightcross-model cache transformationandtwo-phase training:reconstructionfollowed bygeneration. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective incontext-unaware transferwhere prior methods collapse.

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2606\.13594

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.13594 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.13594 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.13594 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Stateful Inference for Low-Latency Multi-Agent Tool Calling

arXiv cs.LG

This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.

Enabling KV Caching of Shared Prefix for Diffusion Language Models

arXiv cs.LG

This paper proposes BiCache, a novel KV caching technique for shared prefixes in diffusion language models, which avoids accuracy collapse by dynamically reusing cached keys and values in shallow layers and achieves 36.3%–98.3% throughput improvement.