sparsity

#sparsity

Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper introduces a grammatically-guided sparse attention mechanism for Transformers, aiming to improve efficiency and interpretability by leveraging linguistic structure.

0 favorites 0 likes

#sparsity

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

RT-Lynx proposes using activation sparsity instead of weight sparsity to accelerate diffusion models, achieving up to 1.55× linear-layer speedup while maintaining generation quality, and is accepted at ICML 2026.

0 favorites 0 likes

#sparsity

Two-Valued Symmetric Circulant Matrices: Applications in Deep Learning

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper proposes the Two-Valued Symmetric Circulant Matrix (TVSCM), a very sparse architecture that uses only two weights per layer to achieve over 80x parameter reduction on MNIST and MIT-BIH arrhythmia datasets while maintaining comparable accuracy, making it ideal for edge and tiny-ML platforms.

0 favorites 0 likes

#sparsity

$\phi$-Balancing for Mixture-of-Experts Training

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper proposes φ-balancing, a principled framework for load balancing in Mixture-of-Experts models that directly targets population-level expert balance using convex duality and mirror descent, achieving more stable expert utilization and outperforming prior methods on reasoning and code generation benchmarks.

0 favorites 0 likes

#sparsity

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper analyzes neural activation patterns across six LLM architectures on cognitive tasks, revealing differences in attention entropy and sparsity between encoder and decoder models.

0 favorites 0 likes

#sparsity

@hardmaru: The human brain is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLM…

X AI KOLs Timeline ↗ · 2026-05-08 Cached

This paper introduces TwELL and Hybrid sparse formats with custom CUDA kernels to efficiently leverage unstructured sparsity in LLMs, achieving over 20% faster training and inference on H100 GPUs while reducing energy and memory usage.

0 favorites 0 likes

sparsity

Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Two-Valued Symmetric Circulant Matrices: Applications in Deep Learning

$\phi$-Balancing for Mixture-of-Experts Training

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

@hardmaru: The human brain is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLM…

Submit Feedback