"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration
Summary
Introduces CoTrace, a framework for goal-level attribution in human-AI collaboration, which analyzes how large language models shape goals by contributing concrete requirements and indirect influences in dialogue turns.
View Cached Full Text
Cached at: 05/22/26, 02:20 PM
Paper page - “I didn’t Make the Micro Decisions”: Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration
Source: https://huggingface.co/papers/2605.21363
Abstract
A goal-level attribution framework called CoTrace is introduced to analyze how large language models contribute to goal shaping in human-AI collaboration, revealing that while models account for a small percentage of direct contributions, they play a significant role in introducing concrete requirements and making indirect contributions.
Aslarge language models(LLMs) increasingly shape how users form, refine, and extend their goals, attributing contributions inhuman-AI collaborationbecomes critical for users calibrating their own reliance and for evaluators assessing AI-assisted work. Yet existing methods focus on final artifacts, missing the process through which goals themselves are jointly shaped. We introduce agoal-level attributionframework,CoTrace, that decomposes explicit goals intoverifiable requirementsand traces both direct contributions andindirect influencesacrossdialogue turns. ApplyingCoTraceto 638 real-world collaboration logs, we find that while models account for only 11-26% ofgoal-shaping contribution, they contribute substantially more on introducing lower-level concrete requirements, and make various kinds of indirect contributions. Throughcontrolled simulations, we show that interaction design choices significantly affect model goal-shaping behavior. In auser study, exposing participants to goal-level analyses shifts their perceived contributions by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand their own AI-assisted work.
View arXiv pageView PDFProject pageGitHub1Add to collection
Get this paper in your agent:
hf papers read 2605\.21363
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.21363 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.21363 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.21363 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
We measured how AI capabilities INTERACT as models scale. Below 3.5B, reasoning and truthfulness fight. Above it, they cooperate. The transition is engineerable. (2 papers + interactive dashboard + 7 falsifiable predictions)
Researchers discovered a critical scale (~3.5B parameters) where the trade-off between reasoning and truthfulness in AI models flips from antagonistic to cooperative. They provide a framework, interactive dashboard, and open-source steering tool to identify and correct misaligned outputs at small scales.
@martin_casado: This tackles a very hard, very important problem in AI systems. Basically how do you expose your traces at scale to age…
A tweet by Martin Casado highlighting a solution to the difficult problem of exposing traces at scale to AI agents, balancing cost and AI leverage.
Beyond the Black Box: Interpretability of Agentic AI Tool Use
This paper introduces a mechanistic interpretability toolkit using Sparse Autoencoders and linear probes to monitor internal model states before AI agents invoke tools, aiming to improve diagnostics and safety in enterprise workflows.
TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories
TraceGraph is a graph-based framework that constructs shared decision landscapes from multi-model agent trajectories, enabling diagnosis of failure regions and improvement via trap-aware recovery pipelines.
Most “agentic AI” conversations feel too abstract. Here is how my agentic research system looks like
The author shares a practical breakdown of an agentic research system they built to identify and evaluate AI use cases within companies. The system uses six agents for discovery, evaluation, and context extraction, emphasizing human-in-the-loop decision-making over full autonomy.