rope

#rope

@ickma2311: Efficient AI Lecture 15: Long-Context LLM Long context is not just a bigger prompt window. The key question is: which p…

X AI KOLs Timeline ↗ · 2026-05-25 Cached

This post summarizes Efficient AI Lecture 15 on long-context LLMs, covering RoPE position interpolation for context extension, the needle-in-haystack evaluation, and StreamingLLM's attention sink phenomenon and KV cache eviction strategy.

0 favorites 0 likes

#rope

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Hugging Face Daily Papers ↗ · 2026-05-21 Cached

SEGA is a training-free method that improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps.

0 favorites 0 likes

#rope

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Hugging Face Daily Papers ↗ · 2026-05-15 Cached

This paper proves that RoPE-based attention fails to distinguish token positions and identity in long contexts, explaining LLM failures within advertised context lengths. Experimental verification shows models optimized for retrieval struggle on simple list tasks.

0 favorites 0 likes

#rope

@YouJiacheng: > Directly applying RoPE rotation to KV will leak positional information into value matrix V 科学空间亦有记载 https://kexue.fm/…

X AI KOLs Timeline ↗ · 2026-05-07 Cached

A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.

0 favorites 0 likes

#rope

@ZhihuFrontier: DeepSeek-V4 RoPE Design In-Depth Analysis Key technical insights curated from Zhihu contributor kaiyuan Core Pain Point…

X AI KOLs Timeline ↗ · 2026-05-07

This article provides an in-depth technical analysis of the RoPE (Rotary Positional Embedding) design in DeepSeek-V4, focusing on how it handles token compression and shared KV caches in CSA and HCA modules.

0 favorites 0 likes

rope

@ickma2311: Efficient AI Lecture 15: Long-Context LLM Long context is not just a bigger prompt window. The key question is: which p…

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

@YouJiacheng: > Directly applying RoPE rotation to KV will leak positional information into value matrix V 科学空间亦有记载 https://kexue.fm/…

@ZhihuFrontier: DeepSeek-V4 RoPE Design In-Depth Analysis Key technical insights curated from Zhihu contributor kaiyuan Core Pain Point…

Submit Feedback