learnable-allocation

#learnable-allocation

ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation

arXiv cs.CL ↗ · 2d ago Cached

ConSA is a framework that learns optimal assignment between full attention and sliding-window attention under a user-specified sparsity target, using L0 regularization and augmented Lagrangian constraint. It demonstrates consistent gains over rule-based baselines on LLMs at 0.6B and 1.7B scales.

0 favorites 0 likes

learnable-allocation

ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation

Submit Feedback