New SOTA 1B model? HRM-text

Reddit r/LocalLLaMA 05/19/26, 09:34 PM Models

hierarchical-reasoning language-model 1b-parameters sota efficient-training latent-space sapient-intelligence

Summary

HRM-text is a 1B-parameter hierarchical reasoning language model proposed by Sapient Intelligence. It thinks efficiently through internal latent space, achieving performance surpassing most models of the same size with extremely low training cost.

Saw this video by them. Seems interesting but Tbh the benchmarks seem too good to be true. I'm not super knowledgeable on how models think so can anyone more knowledgeable explain what exactly is happening. And it's pros and cons? GitHub: https: //github.com/sapientinc/HRM-Text Hugging face: https://huggingface.co/sapientinc/HRM-Text-1B I'm not affiliated with them in anyway, just saw the video on YouTube.

Original Article

View Cached Full Text

Cached at: 05/19/26, 10:50 PM

TL;DR: HRM-text is a 1B-parameter hierarchical reasoning language model proposed by Sapient Intelligence. By performing human-like high-dimensional reasoning in an internal latent space (without intermediate linguistic tokens), it achieves performance surpassing most models of similar size at extremely low training cost (about $1000, completed within a day). ## The Limitations of Chain-of-Thought: Why Models Must "Speak" to Think Current mainstream large language models rely on **Chain-of-Thought** (CoT) — the model must explicitly "say" each step of reasoning before moving to the next. This is like a person skydiving into a mining area to find gold, with only a metal detector. After landing, they can only dig 20 times (corresponding to a Transformer's 20 layers), then must report intermediate findings and continue digging with new context. If any step goes wrong, the entire reasoning derails. This mechanism requires massive training data, and as tasks become harder, more intermediate tokens are used, slowing inference and raising costs. Essentially, CoT is: "dig 20 times → output context → dig 20 more → repeat until solved." ## Inspiration from Human Thinking: Think First, Then Speak Human thinking does not start with language. Thoughts often begin as an abstract, high-dimensional information flow — a continuous, efficient conceptual space that lets us quickly explore complex relationships. Only afterward do we "compress" these rich thoughts into lower-dimensional language. Aphasia patients have impaired language ability but can still perform advanced abstract reasoning, showing that language is just a communication tool, not a container for thought. ## How HRM-text Works **HRM-text** (developed by Sapient Intelligence) is inspired by the brain and employs **hierarchical reasoning**. Rather than jumping straight to language output, it places the problem into an internal **latent-space mind map**. In this space, words are transformed into high-dimensional vectors, like a precise map — containing coordinates, distances, relationships, and other information that cannot normally be expressed in language. The model has two systems: - **H (High-level system)**: The high-level commander, like the gold seeker. It oversees the global picture, plans the overall route, has a slower cycle, and sees the entire layout. - **L (Low-level system)**: The low-level executor, like the gold panner. It explores specific locations and reports findings, has a faster cycle, and handles detailed computations. H and L form a hierarchical team: H guides the direction, L performs exploration. If L only finds gravel, H adjusts the plan; if signs of gold appear, H directs deeper digging. This loop continuously updates the internal map until the solution is clear. The entire process happens internally, without intermediate tokens — HRM "thinks" before it "speaks." ## Extremely Low Training Cost with Remarkable Efficiency The HRM-text 1B parameter version uses roughly the same order of magnitude of training tokens as **Llama 2 2.2 3B**, but only **1/1900** of **Qwen 3.5 2B**; its training compute (FLOPs) is about **1/600** of Qwen 3.5 2B. In other words, it achieves results comparable to similar models with a fraction of the training resources. ### Benchmark Performance On common benchmarks such as MATH, DROP, Arc Challenge, and MMLU, when plotted as "training tokens vs score", HRM-text sits in the top-left corner — representing the least training data and the best performance. Switching to the compute chart yields the same conclusion. Sapient Intelligence claims that it is now possible to train a model entirely from scratch — **at a cost of roughly $1000, within a single day** — that rivals SOTA (state-of-the-art) models in the same category that cost thousands of times more in both training and inference. ## The Significance of a Paradigm Shift HRM-text marks the beginning of a new paradigm: instead of relying on stacking GPUs, massive electricity, and huge budgets to scale models, we can achieve **lower cost, better performance, and faster iteration** by mimicking the efficient thought structure of the human brain. Sapient Intelligence believes this opens new possibilities on the path to Artificial General Intelligence (AGI) — concepts once deemed too expensive to pursue are back on the table. They are convinced that this path leads to more efficient, more powerful, and ultimately the most streamlined general intelligence. ## Source YouTube video link (https://youtu.be/U6K2MP6VseM) GitHub: sapientinc/HRM-Text (https://github.com/sapientinc/HRM-Text) Hugging Face: sapientinc/HRM-Text-1B (https://huggingface.co/sapientinc/HRM-Text-1B)

New SOTA 1B model? HRM-text

Similar Articles

@Sapient_Int: Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performanc…

sapientinc/HRM-Text-1B

HRM Seems To Be Going Off Right Now

HRM-Text: Trained on only 1k$ and 40b tokens with brain inspired hierarchical latent architecture

Submit Feedback

Similar Articles

@Sapient_Int: Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performanc…

HRM Seems To Be Going Off Right Now

@vintcessun: Pretraining can be this cost-effective? Train a usable 1B base model from scratch for ~$1000, slashing compute and data by hundreds of times. The key isn't brute-force compute, but hierarchical recursive architecture plus latent space reasoning, combined with PrefixLM packing and FA3 to maximize efficiency. Sounds insane, but the paper and code are open-sourced.

HRM-Text: Trained on only 1k$ and 40b tokens with brain inspired hierarchical latent architecture