user-simulator

Tag

Cards List
#user-simulator

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

arXiv cs.CL · yesterday Cached

Introduces Dialogue-SWE-Bench, a benchmark for evaluating coding agents' ability to resolve software engineering problems through dialogue with a user. Proposes a persona-grounded user simulator and a schema-guided agent that improves dialogue capabilities.

0 favorites 0 likes
#user-simulator

Your AI Agent is one bad prompt away from ruining your brand (And why traditional QA is useless)

Reddit r/AI_Agents · 4d ago

The article argues that traditional chatbot QA is broken because it only tests happy paths, and proposes using an AI-powered user simulator that attacks the bot with diverse personas and edge cases to find vulnerabilities before deployment.

0 favorites 0 likes
← Back to home

Submit Feedback