automated-benchmarking

Tag

Cards List
#automated-benchmarking

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

arXiv cs.AI · 16h ago Cached

This paper introduces STAGE-Claw, an automated framework for building and evaluating realistic personal-agent scenarios in state-based computing environments, enabling scalable, state-based evaluation of LLM-powered agents.

0 favorites 0 likes
← Back to home

Submit Feedback