This article critiques the trend of ever-larger context windows in LLMs, arguing they don't solve enterprise knowledge problems due to retrieval degradation, data volume, and lack of structure. It advocates for knowledge modeling layers that map relationships and intent before retrieval.
Every few months there's a new announcement about extended context:128K, 200K, 1M tokens and the implied promise is that you'll eventually just stuff your entire company's knowledge into context and get perfect answers. Here's why this doesn't work the way people expect, even at very large context lengths. Problem 1: Retrieval quality degrades with context length. There's solid evidence that LLMs' ability to reliably use information from the middle of very long contexts degrades compared to information near the start or end, the lost in the middle problem. Doubling the context window doesn't double reliable working memory. Problem 2: Enterprise data doesn't fit in a context window. A mid-sized company's meaningful operational data, contracts, emails, meeting notes, internal policies is easily hundreds of gigabytes. Even with unlimited context, you'd still have a selection problem: which tokens are actually relevant to this query? Problem 3: Raw documents are the wrong representation. Even if you could fit everything in context, a flat document dump doesn't encode the relationships and temporal structure that makes institutional knowledge useful. A 2024 contract amendment is more important than the 2019 baseline for most queries but again the model has no way of knowing that without explicit metadata. Scaling context windows won’t solve the core problem. What matters is how knowledge is modeled before retrieval even happens. Enterprise information needs to be mapped through relationships, intent, and source lineage first, so the model receives information that’s already structured around meaning and decision-making instead of raw chunks of text. You can already see this in how some newer knowledge layer platforms are positioning themselves: tools like 60x, Glean, or Kagi’s internal search work less like gigantic scratchpads and more like infra for modeling and routing knowledge across a company. They still use RAG and long-context models under the hood, but the emphasis is on building a graph or schema of what the organization knows, where it lives, and which version should win when information conflicts. The more you look into enterprise AI systems, the more it feels like the real race is happening underneath the model layer. Bigger windows help at the margins but they’re not a replacement for the knowledge layer.
An analysis of why advertised large context windows for LLMs are misleading, as effective attention drops off around 100k tokens, and practical advice for developers to keep sessions in the 'smart zone' by using artifacts and handoffs.
The author questions whether the focus on expanding context windows for AI agents is counterproductive, arguing that accumulated junk slows down long sessions and suggests keeping working context small with external memory.
An opinion piece arguing that long context windows don't equate to memory and that agent failures are often mundane, like forgetting constraints or rereading files, emphasizing that reliability depends on context architecture decisions.
Repowise is an open-source tool that indexes codebases into five intelligence layers—dependency graph, git history, auto-generated docs, architectural decisions, and code health—and exposes them to AI coding agents via MCP tools for more accurate context and fewer tool calls.
A developer shares an architectural pattern to manage context window bloat in continuous Anthropic agent loops, using KV caching, dynamic tool schema loading, and decoupling executor/advisor roles with Claude 3.5 Sonnet and Claude 3 Opus.