layer-normalization

#layer-normalization

The Curse of Depth in Large Language Models

Lobsters Hottest ↗ · 13h ago Cached

This paper introduces the Curse of Depth in LLMs, where deep layers become ineffective due to Pre-Layer Normalization causing output variance explosion. The authors propose LayerNorm Scaling to mitigate this, showing consistent improvements in pre-training and fine-tuning across model sizes up to 7B.

0 favorites 0 likes

layer-normalization

The Curse of Depth in Large Language Models

Submit Feedback