lm-head

Tag

Cards List
#lm-head

Token Geometry

arXiv cs.LG · yesterday Cached

The paper introduces Ember, a lightweight optimizer for embedding and LM-head matrices that exploits gradient geometry to improve efficiency and performance across supervised finetuning, RL, and pretraining, while using far less optimizer state than Adam.

0 favorites 0 likes
← Back to home

Submit Feedback