I think long context agents are failing in a very boring way

Reddit r/artificial News

Summary

An opinion piece arguing that long context windows don't equate to memory and that agent failures are often mundane, like forgetting constraints or rereading files, emphasizing that reliability depends on context architecture decisions.

I think people overestimate what a large context window actually buys you. For example, 200K tokens does not mean memory. It just means the agent has more space to bury the thing that mattered. The failures are usually boring too: it rereads the same file, forgets an earlier constraint, picks a tool that is technically valid but wrong, then outputs something that looks fine until you compare it with the original task. A lot of “agent reliability” work is really context architecture work: what to load, what to drop, what to compress, and what to repeat before the next step.
Original Article

Similar Articles