Tag
This paper introduces a multi-turn interactive framework for reasoning evaluation where LLMs must query a hidden environment and integrate partial observations, instantiated as a benchmark of 474 executable games across five difficulty levels, showing discriminative power and exposing differences in reasoning.
HypoAgent is an agentic framework for interactive abductive hypothesis generation over knowledge graphs, integrating three agents to handle evolving user intents and fine-grained diagnosis, achieving state-of-the-art performance.