automated-benchmarking

#automated-benchmarking

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

arXiv cs.AI ↗ · 14h ago Cached

This paper introduces STAGE-Claw, an automated framework for building and evaluating realistic personal-agent scenarios in state-based computing environments, enabling scalable, state-based evaluation of LLM-powered agents.

0 favorites 0 likes

automated-benchmarking

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

Submit Feedback