Making LLM context assembly programmable

Reddit r/AI_Agents Tools

Summary

RAMPART is a Python library that makes LLM context assembly programmable, allowing developers to register named blocks of context for placement before the model's first token. It improves performance by tens of percentage points on various models through block clustering and tool access control.

Most agent frameworks today treat the system prompt as a static file read at startup. Skills, tools, and rules get concatenated into one block and held fixed for the whole session, even though we know from the lost-in-the-middle work that where instructions sit in context matters as much as what they say. So why not make context assembly itself an explicit, programmable step? That's what RAMPART does. It's a Python library that turns the prompt-construction step into a registry of named blocks running before the model's first token at zero prompt-token cost. Existing SKILL.md and CLAUDE.md files import without modification. The deployment is pure in-RAM, no database, latency bounded by a method call. Compile-time placement and the structural relationship between blocks and the task query both affect task success and grouping a critical block with content-adjacent neighbours lifts performance by tens of percentage points where single-block placement fails. This pattern replicates across five models from three labs. The block clustering raises Mistral-7B's mean pass rate roughly fivefold at the hardest registry size, and a smaller model with the intervention outperforms a larger sibling without it in the mid-registry zone. Tool access control via schema eviction is obvious here and the model never sees what was removed, no policy instruction required. Some interesting possibilities in zero-token coordination among multiple agents emerge as well.
Original Article

Similar Articles

RAMPART: Registry-based Agentic Memory with Priority-Aware Runtime Transformation

arXiv cs.CL

RAMPART is a compile-time memory model and in-RAM block registry for LLM-based agents that uses five composable primitives to manage context assembly with priority-aware ordering and eviction. Experiments across multiple 7-14B models show that block grouping, relevance gating, and schema eviction significantly improve task success rates and reduce prompt token costs.

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

Hugging Face Daily Papers

RAMP is a production-grounded evaluation framework for LLM agents that exposes significant capability degradation invisible to static benchmarks, showing task completion rates collapsing from 100% to 20% across serial workflows. The framework assesses 15 mainstream models on realistic compiler-construction workloads with complex toolchain interactions and staged recovery mechanisms.