mlp

Tag

Cards List
#mlp

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

Reddit r/MachineLearning · yesterday

The article presents a discovered spectral ratio between MLP and attention norms that predicts geometric stability in transformer models, with an optimal range of 0.5–2 to prevent rank collapse.

0 favorites 0 likes
#mlp

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

arXiv cs.CL · yesterday Cached

This paper identifies the 'Massive Emergence Layer' where extreme activations in LLMs originate and propagate, proposing a method to mitigate their rigidity and improve model performance on tasks like math reasoning and instruction following.

0 favorites 0 likes
← Back to home

Submit Feedback