@natashajaques: Really enjoyed reading the Microsoft MAI-Thinking-1 "Building a Hill Climbing Machine" paper. Amazing they publicly rel…

X AI KOLs Following 06/10/26, 12:04 AM Papers

microsoft hill-climbing-machine frontier-model training reinforcement-learning research-paper

Summary

Natasha Jaques praises the Microsoft MAI-Thinking-1 paper for fully disclosing the training recipe for a frontier model, highlighting the token distribution across pre-training, mid-training, and RL post-training phases, and noting that Yann LeCun's cake analogy was prescient.

Really enjoyed reading the Microsoft MAI-Thinking-1 "Building a Hill Climbing Machine" paper. Amazing they publicly released all the info needed to train a frontier model, down to hparams. I also thought this was pretty telling: - pre-training: 30 trillion tokens - mid-training (SFT on STEM/math/code data): 3.55 trillion tokens - RL post-training: 150 billion tokens. Looks like @ylecun was right all along with the cake analogy. Obviously I still think something like RL (optimizing for long term goals) is fundamental to what we think of as intelligence. But it's not the volume of learning signal, it's the optimization on top of an already reasonable predictive model.

Original Article

View Cached Full Text

Cached at: 06/10/26, 01:51 PM

Really enjoyed reading the Microsoft MAI-Thinking-1 “Building a Hill Climbing Machine” paper. Amazing they publicly released all the info needed to train a frontier model, down to hparams.

I also thought this was pretty telling:

pre-training: 30 trillion tokens
mid-training (SFT on STEM/math/code data): 3.55 trillion tokens
RL post-training: 150 billion tokens. Looks like @ylecun was right all along with the cake analogy.

Obviously I still think something like RL (optimizing for long term goals) is fundamental to what we think of as intelligence. But it’s not the volume of learning signal, it’s the optimization on top of an already reasonable predictive model.

@natashajaques: Really enjoyed reading the Microsoft MAI-Thinking-1 "Building a Hill Climbing Machine" paper. Amazing they publicly rel…

Similar Articles

@raydistributed: Congratulations to the Microsoft AI team on MAI-Thinking-1! Exciting to see Ray used in multiple parts of frontier-mode…

@maximelabonne: That's so cool! The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalizati…

@dair_ai: https://x.com/dair_ai/status/2056018543850754283

@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587

@_lamaahmad: We (@CedricWhitney, @SandhiniAgarwal, @EstherTetruas, @OliviaGWatkins2, @dgrobinson) wrote about nuances we’ve observed…

Submit Feedback

Similar Articles

@raydistributed: Congratulations to the Microsoft AI team on MAI-Thinking-1! Exciting to see Ray used in multiple parts of frontier-mode…

@maximelabonne: That's so cool! The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalizati…

@dair_ai: https://x.com/dair_ai/status/2056018543850754283

@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587

@_lamaahmad: We (@CedricWhitney, @SandhiniAgarwal, @EstherTetruas, @OliviaGWatkins2, @dgrobinson) wrote about nuances we’ve observed…