linear-attention

#linear-attention

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

This paper argues that robust state tracking in recurrent models depends on error control dynamics rather than just expressive capacity, proving that affine recurrent networks suffer from accumulating errors that limit their effective horizon.

0 favorites 0 likes

#linear-attention

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

The paper introduces Momentum DeltaNet (MDN), a linear attention model that uses stepwise momentum and parallel algorithms to improve training efficiency and performance over models like Mamba2.

0 favorites 0 likes

#linear-attention

@xenovacom: Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic…

X AI KOLs Following ↗ · 2026-04-23 Cached

Opus 4.7 auto-generated a custom WebGPU kernel that accelerates Qwen3.5 inference up to 13× via fused LinearAttention, now shipping in Transformers.js v4.2.0.

0 favorites 0 likes

#linear-attention

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Reddit r/LocalLLaMA ↗ · 2026-04-22

MoonshotAI released FlashKDA, open-source CUTLASS kernels for Kimi Delta Attention that deliver up to 2.22x speedup over Triton on H20 GPUs.

0 favorites 0 likes

#linear-attention

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Papers with Code Trending ↗ · 2025-09-29 Cached

SANA-Video is a small diffusion model that efficiently generates high-resolution, long videos using linear attention and a constant-memory KV cache, achieving competitive performance at dramatically lower cost and faster speed compared to existing models.

0 favorites 0 likes

linear-attention

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

@xenovacom: Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic…

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Submit Feedback