Tag
Introduces Dialogue-SWE-Bench, a benchmark for evaluating coding agents' ability to resolve software engineering problems through dialogue with a user. Proposes a persona-grounded user simulator and a schema-guided agent that improves dialogue capabilities.