full-attention

Tag

Cards List
#full-attention

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

Hugging Face Daily Papers · 2026-06-18 Cached

HydraHead is a novel attention hybridization architecture that combines Full and Linear Attention at the head level, achieving superior long-context performance with reduced training overhead via interpretability-driven selection and scale-normalized fusion.

0 favorites 0 likes
← Back to home

Submit Feedback