Tag
ConSA is a framework that learns optimal assignment between full attention and sliding-window attention under a user-specified sparsity target, using L0 regularization and augmented Lagrangian constraint. It demonstrates consistent gains over rule-based baselines on LLMs at 0.6B and 1.7B scales.