Tag
Nex-AGI releases Nex-N2, an open-source agentic model series (Nex-N2-Pro and Nex-N2-mini) with an Agentic Thinking framework that unifies reasoning, tool use, and environment execution, achieving top-tier performance on agentic and coding benchmarks.
Ante is a lightweight, self-contained terminal agent harness written in Rust, designed to be fast and dependency-free. It topped Terminal Bench 2.0 and remains highly responsive to user feedback despite being in preview and not yet open-sourced.
A Meta paper shows that coding agents improve significantly when they reuse short summaries of past attempts instead of raw logs, achieving strong gains on SWE-Bench and Terminal-Bench with Claude 4.5 Opus.
Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.
HALO uses RLMs to optimize AI agent harnesses by analyzing execution traces and suggesting improvements, achieving 10%+ gains on several benchmarks like Terminal-Bench and AppWorld.