@a1zhang: Good harness designs can get around extreme token costs when information is structured. There's really no need to feed …

X AI KOLs Following News

Summary

A discussion on how harness designs can reduce token costs by structuring information instead of feeding everything into a language model's context, citing an example of an RLM agent processing many lines of logs with few active tokens.

Good harness designs can get around extreme token costs when information is structured. There's really no need to feed everything into a language model's context all the time. We've conflated naively throwing everything into context with bitter-lesson pilled scaling for too long. A good harness goes a long way!
Original Article
View Cached Full Text

Cached at: 06/15/26, 11:08 PM

Good harness designs can get around extreme token costs when information is structured. There’s really no need to feed everything into a language model’s context all the time.

We’ve conflated naively throwing everything into context with bitter-lesson pilled scaling for too long. A good harness goes a long way!

diego 🧞‍♂️ (@diblacksmith): My RLM agent can effortlessly process ~80k lines of service logs from CloudWatch

in a single go. that’s worth like 8 million tokens.

The cool part is, after 53 steps, it had spent only 32k “active” tokens* (not through the full 8MM yet atp, more like half).

That’s nothing for

Similar Articles

best of the best agentic harnesses do this…

Reddit r/AI_Agents

The author shares insights on building effective agent harnesses: the best ones minimize LLM reliance for trivial tasks and reserve LLMs for complex reasoning, distinguishing genuine harnesses from simple wrappers.