super-weights

#super-weights

I Figured Out What Causes 'Super Weights'

Reddit r/ArtificialInteligence ↗ · 5h ago

Explains that super weights in large language models arise from the SoftMax-Attention interaction creating a 'Nothing Dump' token that serves as a stable reference point; removing these weights cripples performance.

0 favorites 0 likes

super-weights

I Figured Out What Causes 'Super Weights'

Submit Feedback