state-based-evaluation

#state-based-evaluation

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

arXiv cs.AI ↗ · 12h ago Cached

This paper introduces STAGE-Claw, an automated framework for building and evaluating realistic personal-agent scenarios in state-based computing environments, enabling scalable, state-based evaluation of LLM-powered agents.

0 favorites 0 likes

state-based-evaluation

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

Submit Feedback