@oneill_c: https://x.com/oneill_c/status/2054604986269802579

X AI KOLs Timeline News

Summary

The article argues that serious AI companies are moving from wrapping general models to training their own specialized models using proprietary interaction data, as specialisation now routinely matches or beats frontier models for in-distribution agentic tasks, driving better unit economics.

https://t.co/ERAhd412li
Original Article
View Cached Full Text

Cached at: 05/13/26, 08:26 PM

Why every serious AI company is training its own model

For the last 5 years, I have spent all day, every day, taking a general LLM and teaching it to do specific things. At the beginning, this meant teaching GPT-2 to do modular addition. Now, it means teaching trillion plus parameter models to do tasks that sometimes take hours.

2024 was the year of the “wrappers.” Cursor, the canonical example, surpassed GitHub Copilot by wrapping big lab models and becoming the go-to for AI-assisted coding. Then in 2025, Cursor shipped Composer. The foundations are open-source Kimi, but the magic came from a model fully post-trained in-house. They didn’t do this to save on API calls. They did it because they had figured out something the market still hasn’t fully priced in: the reward signal for being good at coding inside Cursor lives inside Cursor, and nowhere else.

Cursor has been the most visible example of something that is now increasingly the strategy for the entire app layer. Get closer to your users to understand when your model works, then build even better models and products. This is what allows you to spin the flywheel.

The pattern is now too consistent to be coincidence. Any company whose product is a long-horizon agentic loop is moving off the labs and onto models trained against their own interaction data. Decagon, Abridge, OpenEvidence, Hippocratic, Intercom, Chroma, Pinterest, Cognition, Lovable, Notion, Harvey, Gamma, World Labs, and all frontier companies are all training their own models on top of open weights. At Baseten, we help this wave of companies train their main agents, getting them off frontier APIs and onto specialised models.

Sutton’s bitter lesson that general data scale trumps human domain expertise does not save the big labs here. The standard pushback is that generalisation beats specialisation eventually, i.e. pre-training scale wins, just wait for the next base model. That argument applies when you are scaling compute against a fixed objective. But most objectives are not fixed.

The objective for “good code completion in this user’s repo” or “good clinical note for this physician’s patient panel” are moving targets. Correctness is discovered through product iteration. No amount of next-token prediction on static corpora produces it. Only RL against outcomes (accepted vs. rejected completions, agent trajectories that succeeded vs. failed in real workflows) produces it. And those outcomes only exist where the product runs. This is the axis where specialisation beats generality, and it’s the axis the remaining frontier headroom sits on.

Over the last year, the empirical case has become undeniable; for a fixed capability budget, a specialised OS model now routinely matches or beats a frontier model on in-distribution agentic tasks, and the gap widens as the task gets longer-horizon and more tool-use-dependent. This is the same direction as the frontier is going but a different mechanism. We are getting better unit economics by close to an order of magnitude.

The labs cannot follow, and the reason is organisational. Frontier labs are organised to serve one model to many customers. Specialisation requires the inverse, that is, many models built for segmented customers, co-designed with the serving stack and the customer’s data loop. The thing that makes a lab good at pre-training (centralised runs, one-model serving economics, research-lab org structure) is in active tension with what makes a specialisation business good. Fine-tuning APIs are a sideline because they have to be a sideline. This is the subject of much debate with friends over walks and meals, but it’s the reality I see. I point them to the fact that OpenAI just deprecated their fine-tuning API.

Treating specialisation as a first-class business would mean conceding that pre-training scale is not the binding constraint on real-world value, which is the thesis their entire capital structure is underwritten by. They can hire domain experts; it does not help, because 98% of what makes OpenEvidence or Abridge good is not medical knowledge, but the feedback loops they have built into the product.

What every company in the wave has figured out is that the only defensibility that survives the collapse of software cost is owning a model trained on signal no one else can see. Every user session generates training data. Every training run produces a better model. Every better model attracts more users and more data. The flywheel turns inside the product loop, and the labs, for all their scale, are on the outside of it. Every single product has an incredibly detailed surface of what constitutes good; companies now care about model UX, which is something decided at training time and cannot be prompted away (e.g. level of tool calling or search depth, parallelism of tool calling, etc).

The question every app layer company is now asking is no longer ‘how do we use AI?’ It is ‘how do we resist commodification to deliver better results for customers?’ The answer is specialised models based on your unique understanding of who you serve every day. The big labs can’t do it, but you can.

Similar Articles

Long AI Short AGI (3 minute read)

TLDR AI

This article argues that AI intelligence is becoming commoditized, similar to compute and storage, and that the most valuable companies will not be model builders but those who own customer relationships, proprietary data, and workflows.