training-method

#training-method

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

GDSD proposes a reinforcement learning method that directly distills denoisers from advantage-guided self-teachers for diffusion language models, avoiding biases from ELBO-based likelihood surrogates. It achieves up to +19.6% accuracy improvements on planning, math, and coding benchmarks over prior state-of-the-art methods.

0 favorites 0 likes

#training-method

@maximelabonne: This is so neat! Dynamic Fine-Tuning (DFT) reweights the SFT loss by the model's own token probability, which creates a…

X AI KOLs Following ↗ · 2026-05-20 Cached

Dynamic Fine-Tuning (DFT) is introduced as a method that reweights the SFT loss using the model's own token probability, creating a feedback loop, and adds forward KL to penalize tokens the base model finds likely but the policy has pushed toward zero probability. The tweet expresses skepticism about SFT papers in practice but praises the attempt.

0 favorites 0 likes

#training-method

@daniel_mac8: babe, wake up. new continual learning breakthrough just dropped. fast-slow training (fst) treats model params as "slow"…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

This tweet announces Fast-Slow Training (FST), a new continual learning method that treats model parameters as slow weights and optimized context as fast weights, reportedly outperforming weights-only training on math, code, and general reasoning benchmarks.

0 favorites 0 likes

training-method

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

@maximelabonne: This is so neat! Dynamic Fine-Tuning (DFT) reweights the SFT loss by the model's own token probability, which creates a…

@daniel_mac8: babe, wake up. new continual learning breakthrough just dropped. fast-slow training (fst) treats model params as "slow"…

Submit Feedback