linear-attention

#linear-attention

@xenovacom: Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic…

X AI KOLs Following ↗ · 2026-04-23 Cached

Opus 4.7 auto-generated a custom WebGPU kernel that accelerates Qwen3.5 inference up to 13× via fused LinearAttention, now shipping in Transformers.js v4.2.0.

0 favorites 0 likes

#linear-attention

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Reddit r/LocalLLaMA ↗ · 2026-04-22

MoonshotAI released FlashKDA, open-source CUTLASS kernels for Kimi Delta Attention that deliver up to 2.22x speedup over Triton on H20 GPUs.

0 favorites 0 likes

linear-attention

@xenovacom: Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic…

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Submit Feedback