swe-agent

Tag

Cards List
#swe-agent

@xdotli: mini-swe-agent is impressive. 100 lines, one bash tool, same prompt for every model tops on DeepSWE by @datacurve where…

X AI KOLs Timeline · 2d ago Cached

mini-swe-agent is a minimal, open-source SWE-agent implementation that tops DeepSWE benchmarks with just 100 lines of code and a single bash tool. The team also open-sourced mini-swe-code for interactive use and mini-swe-acp for evaluation harness across benchmarks.

0 favorites 0 likes
#swe-agent

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

Hugging Face Daily Papers · 2026-05-13 Cached

AgentLens is a framework for process-level assessment of software engineering agent trajectories, revealing that over 10% of passing trajectories exhibit a 'Lucky Pass' behavior. It introduces AgentLens-Bench, a dataset annotated with quality scores, and shows that ranking by quality score can shift model rankings significantly.

0 favorites 0 likes
← Back to home

Submit Feedback