Tag
Introduces FlashMemory DeepSeek-V4 Retriever, a lightweight model that sparsifies DeepSeek-V4's CSA KV-cache by predicting which chunks will be attended to next, keeping only ~10-15% on-device while matching full-attention performance.