positional-encoding

Tag

Cards List
#positional-encoding

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

arXiv cs.CL · 5d ago Cached

LazyAttention introduces a novel attention mechanism that defers positional encoding to enable zero-copy, position-agnostic KV cache reuse across multiple requests. The approach reduces time-to-first-token by 1.37× and increases throughput by 1.40× compared to Block-Attention in RAG settings with skewed document distributions.

0 favorites 0 likes
#positional-encoding

@tilderesearch: https://x.com/tilderesearch/status/2061771450168889432

X AI KOLs Timeline · 2026-06-02 Cached

Wall Attention generalizes diagonal forget gates to softmax attention, enabling state-of-the-art length extrapolation from 4k to 160k+ context zero-shot and outperforming RoPE and FoX in pretraining. It is released as a drop-in replacement with open-source Triton kernels.

0 favorites 0 likes
#positional-encoding

Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention

arXiv cs.LG · 2026-05-27 Cached

This paper proposes Energy-Gated Attention (EGA) and Morlet Positional Encoding (MoPE) to address missing inductive biases in transformer attention: token salience and scale-adaptive locality. Experiments on TinyShakespeare show superadditive gains when combined, highlighting complementarity.

0 favorites 0 likes
#positional-encoding

@gordic_aleksa: new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i…

X AI KOLs Timeline · 2026-05-26 Cached

An in-depth blog post exploring the inner workings of modern dense transformers, covering topics such as YaRN for positional information, hybrid attention for long context lengths, soft capping, QK normalization, and transformer math including FLOPs/token formulas and cluster sizing.

0 favorites 0 likes
#positional-encoding

@YouJiacheng: > Directly applying RoPE rotation to KV will leak positional information into value matrix V 科学空间亦有记载 https://kexue.fm/…

X AI KOLs Timeline · 2026-05-07 Cached

A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.

0 favorites 0 likes
#positional-encoding

@ZhihuFrontier: DeepSeek-V4 RoPE Design In-Depth Analysis Key technical insights curated from Zhihu contributor kaiyuan Core Pain Point…

X AI KOLs Timeline · 2026-05-07

This article provides an in-depth technical analysis of the RoPE (Rotary Positional Embedding) design in DeepSeek-V4, focusing on how it handles token compression and shared KV caches in CSA and HCA modules.

0 favorites 0 likes
← Back to home

Submit Feedback