@natolambert: New podcast with @finbarrtimbers! We survey the latest post-training recipes, from GLM 5.1, Kimi K2.6, DeepSeek V4, Xia…
Summary
Nathan Lambert and Finbarr Timbers discuss the latest post-training recipes for large language models, including DeepSeek V4, GLM 5.1, Kimi K2.6, and the industry shift to multi-teacher on-policy distillation.
View Cached Full Text
Cached at: 06/17/26, 01:44 AM
New podcast with @finbarrtimbers! We survey the latest post-training recipes, from GLM 5.1, Kimi K2.6, DeepSeek V4, Xiaomi MiMo V2.5, Nemotron Ultra, etc. and discuss:
- Why the industry slowly shifted to multi-teacher on-policy distillation (MOPD).
- What an Olmo-style recipe would need improvements in
- How post-training works / suits larger organizational efforts
- Career advice in the foothills of the singularity
- and other topics
I heard y’all wanted me to start doing this, so making some time when I’m in funemployment!
Chapters:
00:00 Introduction & Olmo reflections 06:28 Post-train recipes review (history) 23:00 2026’s model recipes (MiMo Flash, DeepSeek V4, GLM 5, Kimi K2.6, etc.) 39:05 Open-ended post-training discussions 48:22 Career advice in the LLM race
Links below, please follow @interconnectsai and like and subscribe and buy my book?
Similar Articles
@cjzafir: Models that I'm using daily: > Codex 5.5 high (fast) > Deepseek v4 pro via API > Kimi 2.6 via API Models that I am fine…
User shares a personal list of AI models they use daily (Codex 5.5, Deepseek v4 pro, Kimi 2.6) and for fine-tuning (Qwen 3.5 variants, Gemma4 E4B, GPT-oss 20B), aiming to fine-tune Small Language Models into Expert Language Models.
@DJLougen: Proud to introduce a new 27B post-trained model After being impressed by both Fable and Kimi 2.7 Coder, I wanted to see…
Introduces a new 27B post-trained model that distills positives from Fable and Kimi 2.7 Coder, with links to download.
Deepseek, kimi etc..
Mentions of AI models Deepseek and Kimi, possibly discussing recent updates or comparisons.
@ziv_ravid: 1/I read the Nemotron 3 Ultra report and it's interesting to compare their post-training to DeepSeek V4's. Both now do …
The tweet compares the post-training methods of Nemotron 3 Ultra and DeepSeek V4, noting both use multiple specialist teachers and on-policy distillation into a single student, but differ in support overlap.
@tom_doerr: Trained on 13M hours of mixed audio and text data https://github.com/MoonshotAI/Kimi-Audio…
Trained on 13M hours of mixed audio and text data https://t.co/SvoKmvzphI https://t.co/UlKN3OiqG8 --- # MoonshotAI/Kimi-Audio Source: [https://github.com/MoonshotAI/Kimi-Audio](https://github.com/MoonshotAI/Kimi-Audio) <p align="center"> <img src="assets/kimia_logo.png" width="400"/> <p> <p align="center"> Kimi-Audio-7B <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B">🤗</a> | Kimi-Audio-7B-Instruct <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct">🤗</a> |