super-weights

Tag

Cards List
#super-weights

I Figured Out What Causes 'Super Weights'

Reddit r/ArtificialInteligence · 4h ago

Explains that super weights in large language models arise from the SoftMax-Attention interaction creating a 'Nothing Dump' token that serves as a stable reference point; removing these weights cripples performance.

0 favorites 0 likes
← Back to home

Submit Feedback