Tag
TRACE is a monitoring framework for long-horizon LLM agent trajectories that uses a Triage-Inspect-Judge loop to connect evidence across temporally distant actions, achieving high recall and F1 on evasive sabotage detection tasks.
Skill-RM proposes a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications, outperforming traditional judge baselines.