High Dimensional, Dynamic Rotary Positional Embedding [P]
Summary
Introduces HDD-RoPE, an extension of rotary positional embeddings that uses high-dimensional chunks and data-dependent rotation rates, showing faster convergence on TinyStories compared to xPos.
Similar Articles
RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways
This paper proposes RoVE, a parameter-free modification to Rotary Position Embeddings that makes value pathways position-sensitive by rotating values simultaneously with keys, transforming RoPE attention into attentive convolution. Experiments on GPT-2 models show consistent gains in few-shot in-context learning, out-of-distribution perplexity, and long-context retrieval.
RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably
This paper provides a theoretical proof that Rotary Positional Embeddings (RoPE) in Transformer-based language models lose their locality bias and ability to distinguish token order in long contexts, with attention scores becoming no better than random. The authors show that increasing the RoPE base trades off position vs. token distinction and that multi-head, multi-layer architectures cannot compensate for this fundamental limitation.
HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue
HyPE introduces a hypergraph-based persona encoder that models high-order relations among persona attributes via category-aware hyperedges and persistent edge embeddings, achieving consistent improvements over flat pooling baselines on PersonaChat across multiple backbone models.
PJ-RoPE: A Fourier-Jet-Affine Position Space for Relative Attention
PJ-RoPE unifies RoPE's Fourier phase, Jordan-RoPE's finite jets, and ALiBi's affine recency into a single learnable relative-position space, and studies task-driven selection of sectors.
@ZhihuFrontier: DeepSeek-V4 RoPE Design In-Depth Analysis Key technical insights curated from Zhihu contributor kaiyuan Core Pain Point…
This article provides an in-depth technical analysis of the RoPE (Rotary Positional Embedding) design in DeepSeek-V4, focusing on how it handles token compression and shared KV caches in CSA and HCA modules.