@aakashgupta: Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1…
Summary
Andrej Karpathy claimed to Dwarkesh Patel that a 1B-parameter model trained on ultra-clean data could match today's 1.8T-parameter frontier models, implying 1,800× effective compression.
View Cached Full Text
Cached at: 04/22/26, 11:28 AM
Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today’s 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs
Similar Articles
@draecomino: Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s
Cerebras announces it is running Kimi K2.6, a trillion parameter model, at approximately 1,000 tokens per second in enterprise trials, claiming the fastest frontier model performance ever measured by Artificial Analysis.
A 4b model is now beating 30b ones at web research and the reason is not size
A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.
@eliebakouch: very nice release by @OpenAI! a 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion sc…
OpenAI released a 1.5B-parameter MoE model with only 50M active parameters that can filter private data from trillion-token datasets while maintaining 128k context length.
@heyrobinai: THE ENTIRE AI INDUSTRY JUST GOT HUMILIATED a tiny model trained in just a few hours on a single graphics card is planni…
Yann LeCun's team releases LeWorldModel, a tiny 15M-parameter physics model trained on a single GPU in hours that outperforms billion-dollar foundation models in planning speed and physical plausibility, challenging the dominant scaling paradigm.
@vintcessun: Pretraining can be this cost-effective? Train a usable 1B base model from scratch for ~$1000, slashing compute and data by hundreds of times. The key isn't brute-force compute, but hierarchical recursive architecture plus latent space reasoning, combined with PrefixLM packing and FA3 to maximize efficiency. Sounds insane, but the paper and code are open-sourced.
HRM-Text released a 1B-parameter base model, claiming it can be pretrained from scratch for only ~$1000, reducing compute and data volume by hundreds of times. It employs efficient techniques such as hierarchical recursive architecture, latent space reasoning, and PrefixLM packing. The paper and code are open-sourced.