sft

#sft

@QGallouedec: TRL v1.4 is out! two things I'm excited about: → chunked NLL loss for SFT. Way less VRAM, same loss, often faster. Qwen…

X AI KOLs Following ↗ · 2d ago Cached

TRL v1.4 is released, featuring chunked NLL loss for SFT to reduce VRAM usage and first-class integration with OpenReward for GRPO.

0 favorites 0 likes

#sft

Where does output diversity collapse in post-training?

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper investigates where and why output diversity collapses during post-training of language models, analyzing three OLMo 3 lineages (Think, Instruct, RL-Zero) across multiple tasks and metrics. The authors find that diversity collapse is primarily determined by training data composition and embedded in model weights during training, not addressable at inference time alone.

0 favorites 0 likes

sft

@QGallouedec: TRL v1.4 is out! two things I'm excited about: → chunked NLL loss for SFT. Way less VRAM, same loss, often faster. Qwen…

Where does output diversity collapse in post-training?

Submit Feedback