repository-level-tasks

#repository-level-tasks

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

Hugging Face Daily Papers ↗ · 3d ago Cached

SWE-Together is a multi-turn coding benchmark created from real user-agent interactions, featuring a reactive LLM simulator to evaluate agents based on both final correctness and interaction efficiency.

0 favorites 0 likes

repository-level-tasks

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

Submit Feedback