mlp

#mlp

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

Reddit r/MachineLearning ↗ · yesterday

The article presents a discovered spectral ratio between MLP and attention norms that predicts geometric stability in transformer models, with an optimal range of 0.5–2 to prevent rank collapse.

0 favorites 0 likes

#mlp

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

arXiv cs.CL ↗ · yesterday Cached

This paper identifies the 'Massive Emergence Layer' where extreme activations in LLMs originate and propagate, proposing a method to mitigate their rigidity and improve model performance on tasks like math reasoning and instruction following.

0 favorites 0 likes

mlp

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

Submit Feedback