Tag
HydraHead is a novel attention hybridization architecture that combines Full and Linear Attention at the head level, achieving superior long-context performance with reduced training overhead via interpretability-driven selection and scale-normalized fusion.