sft

Tag

Cards List
#sft

@cjzafir: Fine-tune your first AI model today. Run GPT4o level model and run on your phone or laptop. @OpenBMB released 15M sampl…

X AI KOLs Following · 5d ago Cached

OpenBMB released UltraData-SFT-2605, a 15M-sample high-quality SFT dataset for fine-tuning AI models like MiniCPM5-1B to run on phones or laptops.

1 favorites 1 likes
#sft

@AdinaYakup: OpenBMB just released an impressive SFT dataset UltraData-SFT-2605 15M+ high quality samples Deep Thinking + Non-thinki…

X AI KOLs Following · 5d ago Cached

OpenBMB releases UltraData-SFT-2605, a large-scale dataset with over 15 million high-quality samples for supervised fine-tuning (SFT) of reasoning LLMs, covering deep thinking, non-thinking, math, code, knowledge, instruction following, and multilingual data.

0 favorites 0 likes
#sft

Learnability-Informed Fine-Tuning of Diffusion Language Models

arXiv cs.CL · 2026-05-25 Cached

We propose LIFT, a learnability-informed fine-tuning algorithm for diffusion language models that aligns training with token difficulty and time step, achieving substantial gains on reasoning benchmarks.

0 favorites 0 likes
#sft

I fine-tuned an LLM to be C-3PO to test which training data format works best for persona injection [P]

Reddit r/MachineLearning · 2026-05-23 Cached

An experiment comparing three Supervised Fine-Tuning data formats (demonstrations, first-person statements, synthetic documents) for injecting a C-3PO persona into Qwen3-4B, finding first-person statements best for generalization and synthetic documents best for factual knowledge.

0 favorites 0 likes
#sft

@maximelabonne: This is so neat! Dynamic Fine-Tuning (DFT) reweights the SFT loss by the model's own token probability, which creates a…

X AI KOLs Following · 2026-05-20 Cached

Dynamic Fine-Tuning (DFT) is introduced as a method that reweights the SFT loss using the model's own token probability, creating a feedback loop, and adds forward KL to penalize tokens the base model finds likely but the policy has pushed toward zero probability. The tweet expresses skepticism about SFT papers in practice but praises the attempt.

0 favorites 0 likes
#sft

@omarsar0: https://x.com/omarsar0/status/2057114824467792189

X AI KOLs Following · 2026-05-20 Cached

This article describes using Fireworks Agent to automate the fine-tuning of a small open-weight model to generate wiki-style summaries, enabling a self-improving agent loop where model training becomes a callable step.

0 favorites 0 likes
#sft

@anyscalecompute: LLM post-training is the new baseline. Picking the wrong method or GPU config is how you waste a 36-hour run. Introduci…

X AI KOLs Following · 2026-05-15 Cached

Anyscale introduces a new Agent Skill for LLM post-training that automatically selects the optimal fine-tuning method (SFT, DPO, GRPO, etc.) and generates ready-to-launch configs, helping avoid wasted GPU runs.

0 favorites 0 likes
#sft

I taught my 1B to follow instructions. It got worse at following instructions...

Reddit r/LocalLLaMA · 2026-05-14

The author trained 1B, 2B, and 3B models with the same SFT recipe and observed that instruction-following (IFEval) regressed for the 1B and 2B models but improved for the 3B, possibly due to different learning rates or model capacity.

0 favorites 0 likes
#sft

@percyliang: For the next Marin model, we are putting together a new data mix. Currently we have 18T tokens, but could use more. So …

X AI KOLs Following · 2026-05-13 Cached

Percy Liang announces that for the next Marin model, they are compiling a new data mix and request high-quality token data for pre-training, mid-training, and SFT.

0 favorites 0 likes
#sft

@QGallouedec: TRL v1.4 is out! two things I'm excited about: → chunked NLL loss for SFT. Way less VRAM, same loss, often faster. Qwen…

X AI KOLs Following · 2026-05-09 Cached

TRL v1.4 is released, featuring chunked NLL loss for SFT to reduce VRAM usage and first-class integration with OpenReward for GRPO.

0 favorites 0 likes
#sft

Where does output diversity collapse in post-training?

arXiv cs.CL · 2026-04-20 Cached

This paper investigates where and why output diversity collapses during post-training of language models, analyzing three OLMo 3 lineages (Think, Instruct, RL-Zero) across multiple tasks and metrics. The authors find that diversity collapse is primarily determined by training data composition and embedded in model weights during training, not addressable at inference time alone.

0 favorites 0 likes
← Back to home

Submit Feedback