@MaximeRivest: current llm architecture is stupid (if not stupid its, at least, wasteful). take these 3 prompts of 4 context chunks: […

X AI KOLs Following News

Summary

A tweet criticizes current LLM architecture for wasteful recomputation due to order-dependent context, and proposes encoding context units separately to enable order-invariant, efficient caching and generation.

current llm architecture is stupid (if not stupid its, at least, wasteful). take these 3 prompts of 4 context chunks: [file A][file B][task][tool specs] [tool specs][file B][file A][task] [task][file B][file A][tool specs] Their order should be irrelevant, inside these context fields, order is crucial BUT NOT outside. This breaks cache and causes us to recompute a lot of things when a single file in a code base changes, only the generation task changes (not the files nor the tools), etc. Also, its harder to cheaply trim context to the essential. Harder to retrieve context chunk and show only the relevant ones in a way that save compute. Is should be possible to encode these context chunks so that this is possible: u1=encode(Unit(name="file_a.py", content=...)) u2=encode(Unit(name="file_b.py", content=...)) u3=encode(Unit(name="tool_specs.yaml", content=...)) model.generate( task="provide diff for fixing file_a.py file_b.py is irrelevant", ctx_units=[u1, u2, u3] ) In this case u2 (file_b) would have been previously encoded and its impact on the flops for the task should quickly fizzle out as the early layer of the neural net figure out the its irrelevant to the task. And, while u1 and u3 are both relevant their order is not. Has anybody trained something like that? It feels like rich late interaction for generation.
Original Article
View Cached Full Text

Cached at: 06/10/26, 03:53 PM

current llm architecture is stupid (if not stupid its, at least, wasteful).

take these 3 prompts of 4 context chunks:

[file A][file B][task][tool specs] [tool specs][file B][file A][task] [task][file B][file A][tool specs]

Their order should be irrelevant, inside these context fields, order is crucial BUT NOT outside.

This breaks cache and causes us to recompute a lot of things when a single file in a code base changes, only the generation task changes (not the files nor the tools), etc.

Also, its harder to cheaply trim context to the essential. Harder to retrieve context chunk and show only the relevant ones in a way that save compute.

Is should be possible to encode these context chunks so that this is possible:

u1=encode(Unit(name=“file_a.py”, content=…)) u2=encode(Unit(name=“file_b.py”, content=…)) u3=encode(Unit(name=“tool_specs.yaml”, content=…))

model.generate( task=“provide diff for fixing file_a.py file_b.py is irrelevant”, ctx_units=[u1, u2, u3] )

In this case u2 (file_b) would have been previously encoded and its impact on the flops for the task should quickly fizzle out as the early layer of the neural net figure out the its irrelevant to the task.

And, while u1 and u3 are both relevant their order is not.

Has anybody trained something like that?

It feels like rich late interaction for generation.

Similar Articles

Quoting Bryan Cantrill

Simon Willison's Blog

Bryan Cantrill critiques LLMs for lacking the optimization constraint of human laziness, arguing that LLMs will unnecessarily complicate systems rather than improve them, and highlighting how human time limitations drive the development of efficient abstractions.

Beyond Compaction: Structured Context Eviction for Long-Horizon Agents

arXiv cs.CL

Introduces Context Window Lifecycle (CWL), a structured context eviction scheme for long-horizon LLM agents that maintains an effectively unbounded working horizon by evicting content based on a dependency graph, avoiding the limitations of summarization-based compaction and recency truncation.

LLMs and Memory Limitations - review my thoughts pls

Reddit r/ArtificialInteligence

An analysis of LLM memory limitations, arguing that true personal AI requires single-tenant weight customization which conflicts with current multi-tenant cloud economics, and highlighting open-weight models as the likely source of progress.