Tag
RAMPART is a compile-time memory model and in-RAM block registry for LLM-based agents that uses five composable primitives to manage context assembly with priority-aware ordering and eviction. Experiments across multiple 7-14B models show that block grouping, relevance gating, and schema eviction significantly improve task success rates and reduce prompt token costs.
This paper proposes using language models as selective surrogates to optimize GPU kernel runtime, demonstrating a novel approach to performance forecasting.
SkillSmith is a boundary-first compiler-runtime framework that extracts fine-grained operational boundaries from LLM agent skills, enabling agents to dynamically access only relevant components, reducing solve-stage token usage by 57.44% and thinking iterations by 42.99% on the SkillsBench benchmark.