user-simulator

#user-simulator

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

arXiv cs.CL ↗ · yesterday Cached

Introduces Dialogue-SWE-Bench, a benchmark for evaluating coding agents' ability to resolve software engineering problems through dialogue with a user. Proposes a persona-grounded user simulator and a schema-guided agent that improves dialogue capabilities.

0 favorites 0 likes

#user-simulator

Your AI Agent is one bad prompt away from ruining your brand (And why traditional QA is useless)

Reddit r/AI_Agents ↗ · 4d ago

The article argues that traditional chatbot QA is broken because it only tests happy paths, and proposes using an AI-powered user simulator that attacks the bot with diverse personas and edge cases to find vulnerabilities before deployment.

0 favorites 0 likes

user-simulator

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

Your AI Agent is one bad prompt away from ruining your brand (And why traditional QA is useless)

Submit Feedback