attention-sinks

Tag

Cards List
#attention-sinks

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

arXiv cs.CL · yesterday Cached

This paper identifies the 'Massive Emergence Layer' where extreme activations in LLMs originate and propagate, proposing a method to mitigate their rigidity and improve model performance on tasks like math reasoning and instruction following.

0 favorites 0 likes
← Back to home

Submit Feedback