How I stopped context window bloat in continuous Anthropic agent loops (Opus + Sonnet architecture)

Reddit r/AI_Agents 06/09/26, 11:01 AM Tools

agentic-loop context-window claude-opus claude-sonnet kv-caching memory-compaction multi-agent

Summary

A developer shares an architectural pattern to manage context window bloat in continuous Anthropic agent loops, using KV caching, dynamic tool schema loading, and decoupling executor/advisor roles with Claude 3.5 Sonnet and Claude 3 Opus.

I’ve been spending a lot of time deploying multi-agent architectures, and one of the biggest bottlenecks in running continuous agentic loops is hitting context limits and the resulting API latency spikes. I wanted to share an architectural pattern that has been working well for me to manage memory and compute using Claude 3 Opus and 3.5 Sonnet. Here are the three main components of the setup: * **KV Prompt Caching for Latency:** Instead of sending the full system prompt on every turn, I'm utilizing KV caching to isolate latency. The core instructions and static context stay cached, which significantly speeds up the loop iteration. * **Defer Loading Tool Schemas:** Stuffing the initial context with every possible tool schema is what usually causes bloat. I shifted to dynamically loading tool schemas only when the agent's initial routing dictates they might be needed. * **The "Advisor Strategy" (Decoupling roles):** To balance cost and reasoning, I decoupled the execution and advisory layers. I use Claude 3.5 Sonnet as the high-speed "Executor" for standard routing and tool calling. When the logic gets too complex or an error needs debugging, the context (after going through a memory compaction/summarization step) is routed to Opus, which acts purely as the "Advisor" before handing control back to Sonnet. I'd love to hear how you all are handling memory compaction and long-running transcripts in your own agent loops. Are you doing summarize-and-replace, or something else?

Original Article

How I stopped context window bloat in continuous Anthropic agent loops (Opus + Sonnet architecture)

Similar Articles

I built a context window optimization framework for coding agents — open source + paper

Are bigger context windows actually the wrong direction for agents?

What I'm learning trying to ensure context continuity for different agents across different sessions

Effective harnesses for long-running agents

We cut our agent's context window in half, and it got better. kinda didnt expect that

Submit Feedback

Similar Articles

I built a context window optimization framework for coding agents — open source + paper

Are bigger context windows actually the wrong direction for agents?

What I'm learning trying to ensure context continuity for different agents across different sessions

Effective harnesses for long-running agents

We cut our agent's context window in half, and it got better. kinda didnt expect that