@percyliang: For the next Marin model, we are putting together a new data mix. Currently we have 18T tokens, but could use more. So …

X AI KOLs Following 05/13/26, 01:15 PM Models

Summary

Percy Liang announces that for the next Marin model, they are compiling a new data mix and request high-quality token data for pre-training, mid-training, and SFT.

For the next Marin model, we are putting together a new data mix. Currently we have 18T tokens, but could use more. So if you are sitting on some secret stash of high quality tokens, please let us know! Pre-training, mid-training, SFT data all welcome. https://t.co/49DBdzvYXE

Original Article

View Cached Full Text

Cached at: 05/13/26, 06:25 PM

Similar Articles

@eliebakouch: one of my favorite projects is Marin from the stanford folks, they have a scientific approach to training, are ready to…

X AI KOLs Following

Marin is an open-source framework from Stanford for reproducible foundation model research, covering data curation, tokenization, training, and evaluation; it was used to train an 8B parameter model that outperforms Llama 3.1 8B.

@WilliamBarrHeld: To train better open models, we need predictable scaling. Delphi is Marin’s first step: we pretrained many small models…

X AI KOLs Following

Marin AI researchers, led by William Barr Held, introduce Delphi, a methodology that pretrains small models to accurately predict the training outcomes of larger 25B-parameter runs. This research aims to establish predictable scaling for more efficient open-source AI model development.

@percyliang: Not only do we want to train a good model, we want to know it'll be good before we even start training. About a month a…

X AI KOLs Following

The Marin team pre-registered a predicted loss of 2.252 for a 129B parameter MoE model training run, and the actual result landed at 2.234, demonstrating accurate loss prediction before training.

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

Reddit r/singularity

Nous Research releases Token Superposition Training (TST), a method that speeds up LLM pre-training by up to 2.5x across models from 270M to 10B parameters, reducing wall-clock time without altering architecture or data.

Want to build a custom model

Reddit r/LocalLLaMA

A user discusses building a small autocomplete model (25M parameters) as a learning project, mentions hardware constraints (32GB VRAM), data requirements (~100M tokens), and seeks advice on datasets and data formatting for autocomplete-style training.

Similar Articles

@eliebakouch: one of my favorite projects is Marin from the stanford folks, they have a scientific approach to training, are ready to…

@WilliamBarrHeld: To train better open models, we need predictable scaling. Delphi is Marin’s first step: we pretrained many small models…

@percyliang: Not only do we want to train a good model, we want to know it'll be good before we even start training. About a month a…

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

Want to build a custom model

Submit Feedback