repository-level-tasks

Tag

Cards List
#repository-level-tasks

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

Hugging Face Daily Papers · 3d ago Cached

SWE-Together is a multi-turn coding benchmark created from real user-agent interactions, featuring a reactive LLM simulator to evaluate agents based on both final correctness and interaction efficiency.

0 favorites 0 likes
← Back to home

Submit Feedback