attention-sinks

#attention-sinks

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

arXiv cs.CL ↗ · yesterday Cached

This paper identifies the 'Massive Emergence Layer' where extreme activations in LLMs originate and propagate, proposing a method to mitigate their rigidity and improve model performance on tasks like math reasoning and instruction following.

0 favorites 0 likes

attention-sinks

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

Submit Feedback