Tag
Key-Value Means (KVM) is a novel attention mechanism that combines the strengths of transformers and RNNs with controllable computational complexity and memory usage. It supports fixed-size or growing state, offers subquadratic prefill time and sublinear state growth, and can be implemented without custom kernels.