flashmemory

#flashmemory

@karminski3: Magic! DeepSeekV4 context memory compressed to 1/10! Everyone knows DeepSeekV4 supports 1M context and is heavily optimized. To actually use 1M context, VRAM usage is only about 10GB (compared to DeepSeek-V3.2 which needs about…

X AI KOLs Following ↗ · 2d ago Cached

FlashMemory-DeepSeek-V4 proposes a novel inference paradigm called Lookahead Sparse Attention (LSA), which uses a neural memory indexer to actively predict future context needs, compressing physical KV cache usage to 13.5% of full context baseline while improving average accuracy by 0.6%. This method adopts a decoupled training strategy that allows independent training of the indexer without loading the base model, significantly reducing training cost.

0 favorites 0 likes

flashmemory

@karminski3: Magic! DeepSeekV4 context memory compressed to 1/10! Everyone knows DeepSeekV4 supports 1M context and is heavily optimized. To actually use 1M context, VRAM usage is only about 10GB (compared to DeepSeek-V3.2 which needs about…

Submit Feedback