Making LLM context assembly programmable

Reddit r/AI_Agents 06/10/26, 02:21 PM Tools

Summary

RAMPART is a Python library that makes LLM context assembly programmable, allowing developers to register named blocks of context for placement before the model's first token. It improves performance by tens of percentage points on various models through block clustering and tool access control.

Most agent frameworks today treat the system prompt as a static file read at startup. Skills, tools, and rules get concatenated into one block and held fixed for the whole session, even though we know from the lost-in-the-middle work that where instructions sit in context matters as much as what they say. So why not make context assembly itself an explicit, programmable step? That's what RAMPART does. It's a Python library that turns the prompt-construction step into a registry of named blocks running before the model's first token at zero prompt-token cost. Existing SKILL.md and CLAUDE.md files import without modification. The deployment is pure in-RAM, no database, latency bounded by a method call. Compile-time placement and the structural relationship between blocks and the task query both affect task success and grouping a critical block with content-adjacent neighbours lifts performance by tens of percentage points where single-block placement fails. This pattern replicates across five models from three labs. The block clustering raises Mistral-7B's mean pass rate roughly fivefold at the hardest registry size, and a smaller model with the intervention outperforms a larger sibling without it in the mid-registry zone. Tool access control via schema eviction is obvious here and the model never sees what was removed, no policy instruction required. Some interesting possibilities in zero-token coordination among multiple agents emerge as well.

Original Article

Making LLM context assembly programmable

Similar Articles

RAMPART: Registry-based Agentic Memory with Priority-Aware Runtime Transformation

@neural_avb: Shipping the latest fast-rlm fast-rlm lets your LLMs work inside a RLM harness, exploring massive contexts inside a REP…

Built an agent loop where the LLM writes Python code to generate articulated 3D CAD models. You can pin a part to edit it

@samhogan: RLMs pretty much solved context btw You can shove tens of millions of tokens into a good RLM harness and it just works.…

@MaximeRivest: current llm architecture is stupid (if not stupid its, at least, wasteful). take these 3 prompts of 4 context chunks: […

Submit Feedback

Similar Articles

RAMPART: Registry-based Agentic Memory with Priority-Aware Runtime Transformation

@neural_avb: Shipping the latest fast-rlm fast-rlm lets your LLMs work inside a RLM harness, exploring massive contexts inside a REP…

Built an agent loop where the LLM writes Python code to generate articulated 3D CAD models. You can pin a part to edit it

@samhogan: RLMs pretty much solved context btw You can shove tens of millions of tokens into a good RLM harness and it just works.…

@MaximeRivest: current llm architecture is stupid (if not stupid its, at least, wasteful). take these 3 prompts of 4 context chunks: […