mechanism-analysis

#mechanism-analysis

Rethinking the Role of Efficient Attention in Hybrid Architectures

arXiv cs.CL ↗ · 16h ago Cached

This paper systematically analyzes the role of efficient attention modules in hybrid language model architectures, finding that different designs converge in long-context performance under sufficient training, and that long-range retrieval is primarily carried by full attention while efficient attention shapes the optimization trajectory, revealing a 'Large-Window Laziness' phenomenon.

0 favorites 0 likes

#mechanism-analysis

Agentic Transformers Provably Learn to Search via Reinforcement Learning

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper theoretically studies how transformer-based policies acquire search capabilities from reinforcement learning training dynamics in a stochastic tree environment. It shows that a two-head transformer can implement depth-first search and that this mechanism emerges naturally from sparse reward signals under a depth-wise curriculum.

0 favorites 0 likes

mechanism-analysis

Rethinking the Role of Efficient Attention in Hybrid Architectures

Agentic Transformers Provably Learn to Search via Reinforcement Learning

Submit Feedback