@xdotli: mini-swe-agent is impressive. 100 lines, one bash tool, same prompt for every model tops on DeepSWE by @datacurve where…

X AI KOLs Timeline Tools

Summary

mini-swe-agent is a minimal, open-source SWE-agent implementation that tops DeepSWE benchmarks with just 100 lines of code and a single bash tool. The team also open-sourced mini-swe-code for interactive use and mini-swe-acp for evaluation harness across benchmarks.

mini-swe-agent is impressive. 100 lines, one bash tool, same prompt for every model tops on DeepSWE by @datacurve where it matches or beats the vendors' own harnesses. So we open-sourced two things around it: - mini-swe-code: play with it in @opencode's TUI, one command: mini-opencode --attach - mini-swe-acp: run it as an eval harness on any benchmark via @benchflow_ai (ACP) hats off to @KLieret @jyangballin @ArpandeepKhatua and the SWE-agent team. repo in and welcome our new MTS intern @bingran_bry who recently joined @benchflow_ai from quantum physics PhD program at Berkeley!
Original Article
View Cached Full Text

Cached at: 06/12/26, 06:54 AM

mini-swe-agent is impressive.

100 lines, one bash tool, same prompt for every model

tops on DeepSWE by @datacurve where it matches or beats the vendors’ own harnesses.

So we open-sourced two things around it:

  • mini-swe-code: play with it in @opencode’s TUI, one command: mini-opencode –attach
  • mini-swe-acp: run it as an eval harness on any benchmark via @benchflow_ai (ACP)

hats off to @KLieret @jyangballin @ArpandeepKhatua and the SWE-agent team. repo in

and welcome our new MTS intern @bingran_bry who recently joined @benchflow_ai from quantum physics PhD program at Berkeley!

Similar Articles

Someone did an audit on the new DeepSWE, the results aren't pretty

Reddit r/singularity

DeepSWE is a new benchmark for evaluating AI coding agents on real-world software engineering tasks from active open-source repositories, comprising 113 tasks across TypeScript, Go, Python, JavaScript, and Rust with isolated environments and program-based verifiers.

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Hugging Face Daily Papers

SWE-Explore introduces a benchmark for evaluating coding agents' repository exploration capabilities, requiring ranked lists of relevant code regions within line budgets. Experiments show agentic exploration outperforms traditional retrieval, and line-level coverage remains a key differentiator.