Tag
This paper empirically tests whether LLM agents with GNN tools exercise judgment or blindly obey the tool, finding that agents agree with the GNN 97.6–99.2% of the time and that stronger backbones defer even more. The cost of this deference does not shrink with capability, and selective invocation remains an open problem.
SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks.
This paper presents SPADER, a reinforcement learning framework for multi-answer QA that uses step-wise peer advantage for credit assignment and diversity-aware exploration rewards to improve recall of long-tail entities, achieving better performance on several benchmarks.
This paper introduces CoCoDA, a framework that uses a co-evolving compositional Directed Acyclic Graph (DAG) to manage tool libraries for augmented agents. It enables small language models to efficiently retrieve and compose tools, allowing an 8B model to match or exceed the performance of a 32B model on reasoning benchmarks.