Tag
The AgentScope team introduces PawBench, a benchmark for evaluating the combined performance of models and agent harnesses, analyzing 4,050 test cells to show that harness choice can be as impactful as model upgrades.
This paper introduces enhancements to the AgentScope platform, featuring an actor-based distributed mechanism and flexible environment support to enable scalable, efficient, and user-friendly very large-scale multi-agent simulations.