gradient-norm

Tag

Cards List
#gradient-norm

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv cs.LG · yesterday Cached

This paper investigates the effectiveness of top-1 collapse rate as a stability monitor for short-horizon LoRA fine-tuning of discrete diffusion language models, finding it has zero precision, and proposes max gradient norm as a more reliable alternative with higher precision and F1 score on LLaDA-family models.

0 favorites 0 likes
← Back to home

Submit Feedback