Tag
Google Devs discusses using dataset distillation to train smaller models on clean, structured examples to stabilize parser outputs and teach models repeatable behavior.
Cursor AI announced three updates from its Compile keynote, including training a new model in collaboration with SpaceX.
A tweet recommending 'The Smol Training Playbook' on Hugging Face, a resource that demystifies model training for beginners.
An educational overview of knowledge distillation, covering its history, core concepts like softmax and temperature, types, scaling laws, and practical examples including DeepSeek-R1.
GLM-5.2 uses a technique to counteract reward hacking by detecting and blocking suspicious tool calls rather than penalizing the model, which prevents obfuscation seen in other methods.
At Cursor's first conference, they released a 1.5T parameter model trained from scratch, Origin as a direct GitHub alternative, and an iOS app, exceeding market expectations.
Merve (@mervenoyann) shares day two findings of a pipeline using multiple small VLMs as judges for road sign detection, achieving map@50=0.8028 with only 1.3k examples. The thread compares model rejection rates and discusses dataset shrinking, super-specific prompts, and plans to generalize the library.
A thread proposing a method for creating a community AI model using crowdsourced compute via Branch-Train-Stitch to build a Mixture-of-Experts model from independently trained submodels, with discussion of hardware requirements, participant involvement, and technical challenges.
This article explains the technical principles of knowledge distillation in machine learning, pointing out that merely collecting output dialogues from ChatGPT/Claude cannot achieve effective distillation due to the lack of probability distribution information, and discusses the limitations of using generated data in SFT and pre-training.
A technical analysis of two approaches to building self-evolving AI agents: model-based (via architecture like SSMs or transformer with fast-weight updates, and training methods) and harness-based (via memory or meta harness that can rewrite itself). The author provides practical recommendations for different audiences.
A discussion on how AI models perform best with harnesses developed by their own creators, as third-party harnesses may cause underperformance despite strong benchmarks, citing examples like Claude Code for Claude and Codex for GPT.
A new AI model is being trained on over 100 trillion tokens, doubling the typical pretraining data size of 27-50 trillion tokens used by other models like Kimi, Mimo, and DeepSeek.
Explains how frontier AI training uses up to 2,048 GPUs by splitting work across five dimensions, demystifying model training frameworks.
During the development of Evot, it was discovered that to maximize the performance of the Anthropic Opus model, the official Claude Code method is the optimal solution, because the Agent Harness behavior pattern is baked into the weights during training, rather than pure prompt engineering; in the future, Agent Harness competition will push behavior down to the model layer.
Cursor released Composer 2.5, a major update to its AI coding assistant featuring improved intelligence, behavior, and training via targeted reinforcement learning and increased compute, built on Moonshot's Kimi K2.5.
A new training method achieves 2-3x speedup by allowing models to learn more flexibly in early stages, akin to homeschooling vs. factory education.
The talk by @mervenoyann demonstrates that open source models like GLM 5.1 have caught up to closed models, and shows how Hugging Face's ecosystem enables agents to train models, run inference, and build workflows.
Marin AI researchers, led by William Barr Held, introduce Delphi, a methodology that pretrains small models to accurately predict the training outcomes of larger 25B-parameter runs. This research aims to establish predictable scaling for more efficient open-source AI model development.
ml-intern has processed over 1M messages in 3 weeks, enabling accelerated ML research with user projects including model training, architecture replication, and automation tasks.
Fireworks AI announces its training platform in preview, allowing developers to train, fine-tune, and deploy custom AI models with full ownership of data and weights.