Tag
RAMP is a production-grounded evaluation framework for LLM agents that exposes significant capability degradation invisible to static benchmarks, showing task completion rates collapsing from 100% to 20% across serial workflows. The framework assesses 15 mainstream models on realistic compiler-construction workloads with complex toolchain interactions and staged recovery mechanisms.