state-based-evaluation

Tag

Cards List
#state-based-evaluation

STAGE-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios

arXiv cs.AI · 12h ago Cached

This paper introduces STAGE-Claw, an automated framework for building and evaluating realistic personal-agent scenarios in state-based computing environments, enabling scalable, state-based evaluation of LLM-powered agents.

0 favorites 0 likes
← Back to home

Submit Feedback