RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Hugging Face Daily Papers 05/15/26, 12:00 AM Papers

rope position-encoding long-context attention-mechanism llm-limitations theoretical-analysis empirical-study

Summary

This paper proves that RoPE-based attention fails to distinguish token positions and identity in long contexts, explaining LLM failures within advertised context lengths. Experimental verification shows models optimized for retrieval struggle on simple list tasks.

We identify intrinsic limitations of Rotary Positional Embeddings (RoPE) in Transformer-based long-context language models. Our theoretical analysis abstracts away from the specific content of the context and depends only on its length. We prove that as context length increases, RoPE-based attention becomes unpredictable and loses two properties that are central to its effectiveness. First, it loses its locality bias: RoPE is no more likely to favor nearer positions than substantially farther ones. Second, it loses consistency in token relevance: a key vector that receives a higher attention score than an alternative at one position may receive a lower score at another. In both cases, the probability of failure approaches 0.5, no better than random guessing. We further prove that the attention score can remain unchanged when a key token is moved to a different position, or even replaced by a different token, indicating a failure to distinguish positions or tokens. Adjusting the RoPE base trades off distinguishing positions against distinguishing tokens but cannot preserve both at the same time. Increasing the RoPE base hyperparameter, a common practice in today's long-context models, helps distinguish different tokens, but inevitably sacrifices the ability to distinguish positions. Our empirical analysis shows that multi-head, multi-layer architectures are insufficient to overcome these limitations. Our findings suggest that fundamentally new mechanisms for encoding position and token order may be needed in future Transformer long-context language models.

Original Article

View Cached Full Text

Cached at: 05/20/26, 06:39 PM

Paper page - RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Source: https://huggingface.co/papers/2605.15514 LLMs often fail on inputs well within their advertised context lengths. We show that these failures are not merely engineering issues, but from intrinsic limitations of RoPE in long contexts.

Main finding: In long contexts, RoPE-based attention frequently assigns the same attention weight to a token even when it is moved to different positions. Similarly, it can assign the same attention weight to different tokens at the same position.

In this sense, RoPE attention fails to distinguish both where a token appears and what token appears there — hence the title.

We prove these results theoretically and verify them empirically. While the theoretical analysis focuses on a single attention head, we complement it with experiments on real multi-layer, multi-head LLMs. The experiments confirm failures predicted by our theory: LLMs optimized for needle-in-a-haystack-style retrieval will inevitably struggle on a very simple task that asks for the k-th item in a list.

My personal takeaway: advertised context lengths should be interpreted with care. Future long-context LMs may require rethinking how position and token order are represented. With current architectures, agentic frameworks that break long contexts into shorter ones may be a more effective way to work around the intrinsic limitations of RoPE.

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Paper page - RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Similar Articles

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

@samhogan: RLMs pretty much solved context btw You can shove tens of millions of tokens into a good RLM harness and it just works.…

@ickma2311: Efficient AI Lecture 15: Long-Context LLM Long context is not just a bigger prompt window. The key question is: which p…

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

Submit Feedback

Similar Articles

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

@samhogan: RLMs pretty much solved context btw You can shove tens of millions of tokens into a good RLM harness and it just works.…

@ickma2311: Efficient AI Lecture 15: Long-Context LLM Long context is not just a bigger prompt window. The key question is: which p…

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks