Tag
Raymond Chen revisits a unidirectional rotation algorithm for swapping adjacent memory blocks, explaining its recursive approach and performance characteristics.
OSCAR is an offline spectral covariance-aware rotation method for 2-bit KV cache quantization that aligns quantization with attention covariance structures, achieving high accuracy and efficiency for long-context LLM serving.
GoodfireAI found that neural networks perform math by rotating shapes, uncovering a shape-rotating calculator inside an LLM that is used for more than just math.