Tag
ClawForge is a generator-backed benchmark framework for executable command-line workflows under state conflict, evaluating LLM agents on tasks with pre-existing partial, stale, or conflicting artifacts across 17 scenarios.