100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

Reddit r/LocalLLaMA 06/01/26, 04:38 AM Models

Summary

A new AI model is being trained on over 100 trillion tokens, doubling the typical pretraining data size of 27-50 trillion tokens used by other models like Kimi, Mimo, and DeepSeek.

https://preview.redd.it/oss7g2gnll4h1.png?width=894&format=png&auto=webp&s=5d4295707a700ed7541c274b8be8ad75bbd0903d Usually we see 27-50 Trillion tokens in most models, kimi, mimo, deepseek. They seem to have doubled the pretraining data. Minimax-m2.5 was like 27T tokens. If we see mimo, they have done: \- 27T for the Mimo-v2.5-Pro 1 Trillion Parameters \- 48T for the smaller Mimo-v2.5 model which is multimodal. \- 32T for Deepseek V4 Flash and Pro I find it difficult to believe this model will be much bigger than the previous M2 series models. The training data scale is way too big, and will require way more resources for a much bigger model. M3 seems likely to be under 500B params.

Original Article

100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

Similar Articles

Retell vs Vapi vs Plura ai for a production voice agent, which one held up?

Is the Statistical Advantage Worth the Cost? An Empirical Comparison of KANs and MLPs for Structured Data Classification

Kimi K3 leaks: on par with Fable

@jxmnop: ok sorry everyone apparently they did distill lol. but only a tiny bit

Comparing Obelisk with Temporal and Restate

Submit Feedback

Similar Articles

Retell vs Vapi vs Plura ai for a production voice agent, which one held up?

Is the Statistical Advantage Worth the Cost? An Empirical Comparison of KANs and MLPs for Structured Data Classification

Kimi K3 leaks: on par with Fable

@jxmnop: ok sorry everyone apparently they did distill lol. but only a tiny bit

Comparing Obelisk with Temporal and Restate