Tag
This paper demonstrates that cosine similarity is a poor proxy for assessing layer importance in LLMs, and proposes using the actual accuracy drop from layer removal as a more robust metric.