state-conflict

Tag

Cards List
#state-conflict

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

arXiv cs.AI · 2026-05-15 Cached

ClawForge is a generator-backed benchmark framework for executable command-line workflows under state conflict, evaluating LLM agents on tasks with pre-existing partial, stale, or conflicting artifacts across 17 scenarios.

0 favorites 0 likes
← Back to home

Submit Feedback