@gneubig: We've found this sort of "sidekick" architecture to be very effective at cutting LLM spend because it allows you to do …

X AI KOLs Following Tools

Summary

Graham Neubig shares a sidekick architecture for reducing LLM costs by delegating simple tasks to a smaller agent, with a 200-line example using the OpenHands SDK. This approach is also used in Cognition's Devin Fusion hybrid-model harness.

We've found this sort of "sidekick" architecture to be very effective at cutting LLM spend because it allows you to do context control and not spend expensive tokens on simple tasks. Here's a 200-line example of how to do it in the OpenHands SDK :) https://gist.github.com/neubig/412ab8df8e6fd0b2bdf10602d77f9d86…
Original Article
View Cached Full Text

Cached at: 06/30/26, 07:41 AM

We’ve found this sort of “sidekick” architecture to be very effective at cutting LLM spend because it allows you to do context control and not spend expensive tokens on simple tasks. Here’s a 200-line example of how to do it in the OpenHands SDK :) https://gist.github.com/neubig/412ab8df8e6fd0b2bdf10602d77f9d86…

Cognition (@cognition): Devin Fusion uses a hybrid-model harness built around two ideas:

First, a “sidekick” agent: a smaller agent runs in parallel with the frontier agent. The frontier agent delegates work, monitors progress, and keeps ownership of planning, ambiguity, and final review.

This lets

Similar Articles

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

arXiv cs.AI

A controlled study of compound LLM agent design in an adversarial POMDP (CybORG CAGE-2), systematically varying context, reasoning, and hierarchy across five model families. Key findings: programmatic state abstraction yields large returns per token, hierarchy without deliberation tools achieves best absolute performance, and context engineering is more cost-effective than deeper reasoning.