sparse-attention

#sparse-attention

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Hugging Face Daily Papers ↗ · yesterday Cached

Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.

0 favorites 0 likes

#sparse-attention

Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces Louver, a novel index structure for KV cache retrieval that reformulates sparse attention as a range searching problem, guaranteeing zero false negatives and improving efficiency over existing methods.

0 favorites 0 likes

#sparse-attention

@lateinteraction: guess what NVIDIA used here for an "attention-based encoder-decoder to retrieve directly from its own internal represen…

X AI KOLs Following ↗ · 5d ago Cached

NVIDIA utilized late interaction, a form of sparse attention, for an attention-based encoder-decoder to retrieve directly from internal representations.

0 favorites 0 likes

#sparse-attention

MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference

Hugging Face Daily Papers ↗ · 5d ago Cached

The paper introduces MISA, a method that applies a mixture-of-experts approach to the indexer heads in sparse attention mechanisms, significantly reducing computational costs for long-context LLM inference while maintaining performance.

0 favorites 0 likes

#sparse-attention

Lightning Unified Video Editing via In-Context Sparse Attention

Hugging Face Daily Papers ↗ · 2026-05-06 Cached

This paper introduces In-context Sparse Attention (ISA), a framework that significantly reduces computational costs in video editing by pruning redundant context and using dynamic query grouping. The authors demonstrate the method's effectiveness with LIVEditor, achieving near-lossless acceleration and state-of-the-art results on multiple video editing benchmarks.

0 favorites 0 likes

sparse-attention

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache

@lateinteraction: guess what NVIDIA used here for an "attention-based encoder-decoder to retrieve directly from its own internal represen…

MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference

Lightning Unified Video Editing via In-Context Sparse Attention

Submit Feedback