grouped-query-attention

Tag

Cards List
#grouped-query-attention

Architecture, Not Scale: Circuit Localization in Large Language Models

arXiv cs.CL · 2d ago Cached

This paper challenges the assumption that mechanistic interpretability becomes harder as models scale, showing that architecture (specifically Grouped Query Attention vs. Multi-Head Attention) matters more than parameter count for circuit localization and stability.

0 favorites 0 likes
← Back to home

Submit Feedback