positional-encoding

#positional-encoding

Frontier Language Models Struggle to Copy: Text Can Be Better Viewed in 2D

arXiv cs.CL ↗ · 2026-07-20 Cached

This paper reveals that frontier LLMs struggle with exact copying tasks due to limitations in positional encodings, and proposes 2D-RoPE, a method that organizes text into a 2D grid to enable perfect copying, showing advantages in synthetic and large-scale pretraining.

0 favorites 0 likes

#positional-encoding

Fingerprint, Not Blueprint: How Positional Schemes Set the Default Spectral Algebra of Attention

arXiv cs.LG ↗ · 2026-07-09 Cached

This paper investigates the spectral properties of the QK operator in attention heads, showing that the positional scheme (RoPE, learned absolute, ALiBi) sets a default spectral algebra that acts as a fingerprint consolidated after function rather than a hard constraint.

0 favorites 0 likes

#positional-encoding

Observable- and Positional-Encoding-Dependent Symmetry Readout from Neural Network Weights

arXiv cs.LG ↗ · 2026-07-07 Cached

This paper shows that the geometric symmetry visible from neural network weights depends on the positional encoding and readout observable, and validates this using MLPs trained on 2D signed distance functions with multiple symmetry groups.

0 favorites 0 likes

#positional-encoding

@agopal42: Presenting PoPE today at #ICML2026! We revisit RoPE through the lens of content-position entanglement, and show how pol…

X AI KOLs Timeline ↗ · 2026-07-06 Cached

New paper introduces PoPE, a positional encoding that decouples content and position, addressing a fundamental flaw in RoPE used in many LLMs like Qwen, Gemma, DeepSeek. Presented at ICML2026.

0 favorites 0 likes

#positional-encoding

The Wiola Architecture for Efficient Small Language Models

arXiv cs.AI ↗ · 2026-07-03 Cached

Wiola is a novel Small Language Model architecture introducing five independently designed components—SRPE, GCLA, ATM, DSFF, and WiolaRMSNorm—aimed at improving efficiency and coherence, released in sizes from 120M to 1.5B parameters and integrated with HuggingFace Transformers.

0 favorites 0 likes

#positional-encoding

@CamilleRoux: Une explication bien faite du fonctionnement interne des LLMs : tokens, embeddings, positional encoding, attention, fee…

X AI KOLs Timeline ↗ · 2026-06-14 Cached

This tweet shares a well-made explanation of the internal workings of LLMs, covering tokens, embeddings, positional encoding, attention, and feed-forward networks, via a blog post by 0xkato.

1 favorites 1 likes

#positional-encoding

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

arXiv cs.CL ↗ · 2026-06-04 Cached

LazyAttention introduces a novel attention mechanism that defers positional encoding to enable zero-copy, position-agnostic KV cache reuse across multiple requests. The approach reduces time-to-first-token by 1.37× and increases throughput by 1.40× compared to Block-Attention in RAG settings with skewed document distributions.

0 favorites 0 likes

#positional-encoding

@tilderesearch: https://x.com/tilderesearch/status/2061771450168889432

X AI KOLs Timeline ↗ · 2026-06-02 Cached

Wall Attention generalizes diagonal forget gates to softmax attention, enabling state-of-the-art length extrapolation from 4k to 160k+ context zero-shot and outperforming RoPE and FoX in pretraining. It is released as a drop-in replacement with open-source Triton kernels.

0 favorites 0 likes

#positional-encoding

Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper proposes Energy-Gated Attention (EGA) and Morlet Positional Encoding (MoPE) to address missing inductive biases in transformer attention: token salience and scale-adaptive locality. Experiments on TinyShakespeare show superadditive gains when combined, highlighting complementarity.

0 favorites 0 likes

#positional-encoding

@gordic_aleksa: new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

An in-depth blog post exploring the inner workings of modern dense transformers, covering topics such as YaRN for positional information, hybrid attention for long context lengths, soft capping, QK normalization, and transformer math including FLOPs/token formulas and cluster sizing.

0 favorites 0 likes

#positional-encoding

@YouJiacheng: > Directly applying RoPE rotation to KV will leak positional information into value matrix V 科学空间亦有记载 https://kexue.fm/…

X AI KOLs Timeline ↗ · 2026-05-07 Cached

A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.

0 favorites 0 likes

#positional-encoding

@ZhihuFrontier: DeepSeek-V4 RoPE Design In-Depth Analysis Key technical insights curated from Zhihu contributor kaiyuan Core Pain Point…

X AI KOLs Timeline ↗ · 2026-05-07

This article provides an in-depth technical analysis of the RoPE (Rotary Positional Embedding) design in DeepSeek-V4, focusing on how it handles token compression and shared KV caches in CSA and HCA modules.

0 favorites 0 likes

positional-encoding

Submit Feedback