@mfpiccolo: Kaffu's "rich man's toy" line is the one of the sharp thing I've read on harnesses this year. He's right about the symp…

X AI KOLs Timeline 06/01/26, 03:46 PM News

agent-engineering harness composability ai-architecture software-bloat fine-tuning

Summary

The tweet discusses the problem of bloat in AI agent harnesses, agreeing with Kaffu's critique that harnesses become "rich man's toys," and advocates for a composable architecture of small, replaceable workers to reduce drift and keep systems cheap and debuggable.

Kaffu's "rich man's toy" line is the one of the sharp thing I've read on harnesses this year. He's right about the symptom. I'd push back on one part of the diagnosis. The bloat drift he names, agent engineering quietly turning into software engineering, is real. Every harness team I've talked to hits it around month nine. The framework you started with grows features it didn't need, the system prompt swells, the retrieval layer doubles, the cost-per-task triples. Codex and Claude Code keep getting better, and you start to wonder what you're building. My extension: the drift is structural. It happens because the unit of work in a framework-shaped harness is the whole framework. To add a capability you grow the framework. To change a behaviour you fork the framework. The bloat has nowhere else to go. When the unit shrinks to one narrow worker, one typed function, one job, drift loses its surface area. A retrieval worker that's wrong gets replaced, not extended. The math-based reranker kaffu is right to advocate for becomes a worker that registers rerank::score. The fine-tuned RoBERTa becomes a worker that registers embed::generate. They sit next to the LLM provider worker on the same bus. The system stays cheap by being composable. Simply, everything become a worker. This doesn't make harnesses economically valuable on its own. Kaffu's deeper point stands. Most of what teams ship is fancy on paper and useless in production. The framework era encouraged that because the unit it sold was always too big. I don't know what the economically valuable harness looks like at steady state. I think it looks small. Small enough that every part is replaceable, every part is debuggable, every part is benchmarkable, and observable with observability worker against a 100-line fine-tuned alternative. The harness as a slider, not a monument. For the love of the game.

Original Article

View Cached Full Text

Cached at: 06/02/26, 01:52 AM

Kaffu’s “rich man’s toy” line is the one of the sharp thing I’ve read on harnesses this year. He’s right about the symptom. I’d push back on one part of the diagnosis.

The bloat drift he names, agent engineering quietly turning into software engineering, is real. Every harness team I’ve talked to hits it around month nine. The framework you started with grows features it didn’t need, the system prompt swells, the retrieval layer doubles, the cost-per-task triples. Codex and Claude Code keep getting better, and you start to wonder what you’re building.

My extension: the drift is structural. It happens because the unit of work in a framework-shaped harness is the whole framework. To add a capability you grow the framework. To change a behaviour you fork the framework. The bloat has nowhere else to go.

When the unit shrinks to one narrow worker, one typed function, one job, drift loses its surface area. A retrieval worker that’s wrong gets replaced, not extended. The math-based reranker kaffu is right to advocate for becomes a worker that registers rerank::score. The fine-tuned RoBERTa becomes a worker that registers embed::generate. They sit next to the LLM provider worker on the same bus. The system stays cheap by being composable.

Simply, everything become a worker.

This doesn’t make harnesses economically valuable on its own. Kaffu’s deeper point stands. Most of what teams ship is fancy on paper and useless in production. The framework era encouraged that because the unit it sold was always too big.

I don’t know what the economically valuable harness looks like at steady state. I think it looks small. Small enough that every part is replaceable, every part is debuggable, every part is benchmarkable, and observable with observability worker against a 100-line fine-tuned alternative. The harness as a slider, not a monument.

For the love of the game.

@mfpiccolo: Kaffu's "rich man's toy" line is the one of the sharp thing I've read on harnesses this year. He's right about the symp…

Similar Articles

@mfpiccolo: https://x.com/mfpiccolo/status/2060069083878408689

@dair_ai: // State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If the…

Observation: the best agent harness for each model will be from the model developer themselves

@oran_ge: Every team in the future will be doing harness engineering, and everyone needs to understand this framework. Although there are some non-consensus points, this is a good review.

The Cost of Overfitting the Harness (2 minute read)

Submit Feedback

Similar Articles

@mfpiccolo: https://x.com/mfpiccolo/status/2060069083878408689

@dair_ai: // State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If the…

Observation: the best agent harness for each model will be from the model developer themselves

@oran_ge: Every team in the future will be doing harness engineering, and everyone needs to understand this framework. Although there are some non-consensus points, this is a good review.

The Cost of Overfitting the Harness (2 minute read)