Tag
This post summarizes Efficient AI Lecture 15 on long-context LLMs, covering RoPE position interpolation for context extension, the needle-in-haystack evaluation, and StreamingLLM's attention sink phenomenon and KV cache eviction strategy.
SEGA is a training-free method that improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps.
This paper proves that RoPE-based attention fails to distinguish token positions and identity in long contexts, explaining LLM failures within advertised context lengths. Experimental verification shows models optimized for retrieval struggle on simple list tasks.
A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.
This article provides an in-depth technical analysis of the RoPE (Rotary Positional Embedding) design in DeepSeek-V4, focusing on how it handles token compression and shared KV caches in CSA and HCA modules.