Tag
This paper introduces STAGE-Claw, an automated framework for building and evaluating realistic personal-agent scenarios in state-based computing environments, enabling scalable, state-based evaluation of LLM-powered agents.