SWE-chat: Coding Agent Interactions From Real Users in the Wild
Summary
SWE-chat introduces a 6,000-session dataset of real-world coding agent interactions, revealing that only 44% of agent-generated code survives in commits and highlighting inefficiencies and security issues in current AI-assisted development.
View Cached Full Text
Cached at: 04/23/26, 03:35 AM
Paper page - SWE-chat: Coding Agent Interactions From Real Users in the Wild
Source: https://huggingface.co/papers/2604.20779
Abstract
SWE-chat presents a large-scale dataset of real coding agent interactions that reveals significant inefficiencies and challenges in current AI-assisted development practices.
AIcodingagentsarebeingadoptedatscale,yetwelackempiricalevidenceonhowpeopleactuallyusethemandhowmuchoftheiroutputisusefulinpractice.WepresentSWE-chat,thefirstlarge-scaledatasetofrealcodingagentsessionscollectedfromopen-sourcedevelopersinthewild.Thedatasetcurrentlycontains6,000sessions,comprisingmorethan63,000userpromptsand355,000agenttoolcalls.SWE-chatisalivingdataset;ourcollectionpipelineautomaticallyandcontinuallydiscoversandprocessessessionsfrompublicrepositories.LeveragingSWE-chat,weprovideaninitialempiricalcharacterizationofreal-worldcodingagentusageandfailuremodes.Wefindthatcodingpatternsarebimodal:in41%ofsessions,agentsauthorvirtuallyallcommittedcode(“vibecoding”),whilein23%,humanswriteallcodethemselves.Despiterapidlyimprovingcapabilities,codingagentsremaininefficientinnaturalsettings.Just44%ofallagent-producedcodesurvivesintousercommits,andagent-writtencodeintroducesmoresecurityvulnerabilitiesthancodeauthoredbyhumans.Furthermore,userspushbackagainstagentoutputs--throughcorrections,failurereports,andinterruptions--in44%ofallturns.Bycapturingcompleteinteractiontraceswithhumanvs.agentcodeauthorshipattribution,SWE-chatprovidesanempiricalfoundationformovingbeyondcuratedbenchmarkstowardsanevidence-basedunderstandingofhowAIagentsperforminrealdeveloperworkflows.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2604\.20779
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.20779 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.20779 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.20779 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
Socratic-SWE introduces a closed-loop self-evolution framework for software engineering agents that leverages historical solving traces to generate targeted repair tasks, achieving 50.40% on SWE-bench Verified after three iterations.
@Xudong07452910: This paper is a must-read for heavy users of Claude Code, Codex, or other AI Agents. It doesn't study how Agents fail on benchmarks, but a more real problem: In real development, what exactly are AI coding agents doing...
This paper analyzes 20,574 real-world coding-agent sessions to identify how AI agents misalign with developer intent, finding that constraint violations and inaccurate self-reporting are the most common failure modes, imposing trust and effort costs rather than irreversible damage.
Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents
Introduces Dialogue-SWE-Bench, a benchmark for evaluating coding agents' ability to resolve software engineering problems through dialogue with a user. Proposes a persona-grounded user simulator and a schema-guided agent that improves dialogue capabilities.
AI coding agent output verification in 2026: read the diff, vibe check it, merge
A reflection on current practices for verifying AI coding agent output, noting that developers often skim diffs and merge without fully auditing the agent's session activity, raising concerns about code review culture in the age of AI.
Agentic Code Review (15 minute read)
An analysis of how AI coding agents have shifted the bottleneck from writing code to reviewing it, with data showing a 861% increase in code churn and a rise in defect rates, making code review the most leveraged skill in software engineering.