@natashajaques: Really enjoyed reading the Microsoft MAI-Thinking-1 "Building a Hill Climbing Machine" paper. Amazing they publicly rel…
Summary
Natasha Jaques praises the Microsoft MAI-Thinking-1 paper for fully disclosing the training recipe for a frontier model, highlighting the token distribution across pre-training, mid-training, and RL post-training phases, and noting that Yann LeCun's cake analogy was prescient.
View Cached Full Text
Cached at: 06/10/26, 01:51 PM
Really enjoyed reading the Microsoft MAI-Thinking-1 “Building a Hill Climbing Machine” paper. Amazing they publicly released all the info needed to train a frontier model, down to hparams.
I also thought this was pretty telling:
- pre-training: 30 trillion tokens
- mid-training (SFT on STEM/math/code data): 3.55 trillion tokens
- RL post-training: 150 billion tokens. Looks like @ylecun was right all along with the cake analogy.
Obviously I still think something like RL (optimizing for long term goals) is fundamental to what we think of as intelligence. But it’s not the volume of learning signal, it’s the optimization on top of an already reasonable predictive model.
Similar Articles
@raydistributed: Congratulations to the Microsoft AI team on MAI-Thinking-1! Exciting to see Ray used in multiple parts of frontier-mode…
Microsoft AI announces MAI-Thinking-1, a 35B active/1T total MoE reasoning model competitive on STEM and coding tasks, developed using Ray for distributed training and orchestration.
@maximelabonne: That's so cool! The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalizati…
The tweet highlights a paper by the Meituan team on Skill0, an RL recipe for skill internalization, and references a related paper on self-distilled agentic RL.
@dair_ai: https://x.com/dair_ai/status/2056018543850754283
A roundup of the top AI papers from May 11-17, covering Lighthouse Attention for long-context pretraining, a comparison of grep vs embedding retrieval for coding agents, and mechanistic interpretability work revealing a geometric calculator in LLMs.
@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587
The author shares learnings from training a 160M parameter LLM from scratch, experimenting with architectures like multi-token prediction and hierarchical reasoning models. They emphasize the importance of fast iteration, simplifying ideas, and understanding why architectures work.
@_lamaahmad: We (@CedricWhitney, @SandhiniAgarwal, @EstherTetruas, @OliviaGWatkins2, @dgrobinson) wrote about nuances we’ve observed…
OpenAI researchers share lessons learned from working with third parties on frontier model evaluations, highlighting the importance of considering the evaluation harness and potential validity issues like reward hacking, contamination, and sandbagging.