Tag
This paper proposes RoVE, a parameter-free modification to Rotary Position Embeddings that makes value pathways position-sensitive by rotating values simultaneously with keys, transforming RoPE attention into attentive convolution. Experiments on GPT-2 models show consistent gains in few-shot in-context learning, out-of-distribution perplexity, and long-context retrieval.
PJ-RoPE unifies RoPE's Fourier phase, Jordan-RoPE's finite jets, and ALiBi's affine recency into a single learnable relative-position space, and studies task-driven selection of sectors.
This paper proves that RoPE-based attention fails to distinguish token positions and identity in long contexts, explaining LLM failures within advertised context lengths. Experimental verification shows models optimized for retrieval struggle on simple list tasks.