dense-moe

#dense-moe

@oneill_c: 1/ We fine-tune a lot of customer models, so we decided to systematically try and figure out some best practices for fi…

X AI KOLs Following ↗ · 5d ago Cached

The thread shares systematic experimental findings on fine-tuning best practices, varying one SFT lever at a time across dense and MoE models up to 235B on four real-world customer datasets with custom evals to eliminate confounders.

0 favorites 0 likes

dense-moe

@oneill_c: 1/ We fine-tune a lot of customer models, so we decided to systematically try and figure out some best practices for fi…

Submit Feedback