layer-normalization

Tag

Cards List
#layer-normalization

The Curse of Depth in Large Language Models

Lobsters Hottest · 10h ago Cached

This paper introduces the Curse of Depth in LLMs, where deep layers become ineffective due to Pre-Layer Normalization causing output variance explosion. The authors propose LayerNorm Scaling to mitigate this, showing consistent improvements in pre-training and fine-tuning across model sizes up to 7B.

0 favorites 0 likes
← Back to home

Submit Feedback