TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories
Summary
TraceGraph is a graph-based framework that constructs shared decision landscapes from multi-model agent trajectories, enabling diagnosis of failure regions and improvement via trap-aware recovery pipelines.
View Cached Full Text
Cached at: 06/01/26, 09:27 AM
# TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories Source: [https://arxiv.org/abs/2605.31308](https://arxiv.org/abs/2605.31308) [View PDF](https://arxiv.org/pdf/2605.31308) > Abstract:Agent benchmarks increasingly record rich interaction trajectories, yet evaluation often reduces each rollout to a pass rate or reward score\. We introduce TraceGraph, a graph\-based framework that turns released multi\-model agent trajectories into shared decision landscapes\. For each task, TraceGraph builds a graph over observable action\-observation states from pooled rollouts before model identity is introduced\. It then overlays outcome\-informed productive cores and trap regions, and summarizes each rollout with three events: Access, Trap exposure, and Repair\. Across trajectories spanning five benchmark splits, TraceGraph profiles reveal navigation differences hidden by aggregate scores and show that splits differ in whether they reward avoiding traps or recovering from them\. The same TraceGraph landscape also motivates a trap\-aware recovery pipeline for SWE\-bench: aruntime detector fires on states matching historical trap regions, then lightweight continuation policies are evaluated from the same prefix\. On fired states, the best pooled single\-factor policy raises official resolved rate from 40\.4% to 43\.5% on the per\-provider fired subset and from 41\.0% to 44\.8% on common\-fired instances, with provider\-specific active components\. Overall, TraceGraph provides a process vocabulary for asking what agent benchmarks test, where models diverge on a shared landscape, and how failure regions can guide downstream improvement\. ## Submission history From: Junjie Nian \[[view email](https://arxiv.org/show-email/23d7ba74/2605.31308)\] **\[v1\]**Fri, 29 May 2026 13:40:31 UTC \(1,139 KB\)
Similar Articles
TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety
This paper proposes TRACE, a trajectory-level safety detection method for long-horizon LLM agents that compresses full trajectory evidence into a latent state to better aggregate dispersed risk signals, achieving state-of-the-art accuracy on multiple benchmarks.
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
This paper introduces AgentAtlas, a framework that goes beyond outcome-only leaderboards for LLM agents by proposing a six-state control-decision taxonomy and a nine-category trajectory-failure taxonomy to evaluate agent behavior more comprehensively.
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
GraphBit is a graph-based agentic framework that uses deterministic DAG orchestration with a Rust engine to eliminate hallucinations and infinite loops. It achieves 67.6% accuracy on GAIA benchmarks with zero framework-induced errors and low latency.
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
StraTA proposes strategic trajectory abstraction for long-horizon LLM agents, using hierarchical GRPO-style rollout with diverse strategy sampling and critical self-judgment to improve sample efficiency and final performance over frontier models and prior RL baselines.
"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration
Introduces CoTrace, a framework for goal-level attribution in human-AI collaboration, which analyzes how large language models shape goals by contributing concrete requirements and indirect influences in dialogue turns.