Tag
The AgentScope team introduces PawBench, a benchmark for evaluating the combined performance of models and agent harnesses, analyzing 4,050 test cells to show that harness choice can be as impactful as model upgrades.
Browser Use 0.13.0 is a complete rewrite in Rust, providing custom LLM and browser harnesses optimized for state-of-the-art models, replacing the previous GPT-4-centric version.
Browser Use Terminal is a Rust TUI for browser agents that allows users to automate browser tasks from the terminal with a new LLM harness that is 2x cheaper and 2x faster than Browser Harness.
The author updates a custom harness and UI bridge tool to run the Qwen 3.6 model on GitHub Copilot Codex via llama.cpp on a local RTX 5090. The post details implemented features, fixed bugs, and remaining limitations in achieving parity with native OpenAI models.
The author shares a debugging experience where an agent loop was caused by a harness truncating tool outputs rather than model failure, highlighting the reliability gap in agent infrastructure compared to models.