full-attention

#full-attention

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

Hugging Face Daily Papers ↗ · 2026-06-18 Cached

HydraHead is a novel attention hybridization architecture that combines Full and Linear Attention at the head level, achieving superior long-context performance with reduced training overhead via interpretability-driven selection and scale-normalized fusion.

0 favorites 0 likes

full-attention

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

Submit Feedback