Tag
This paper proposes Robust-TO, an agentic video understanding framework that integrates per-frame trustworthiness to address the Blind Trust Problem, achieving significant accuracy gains under realistic perturbations.
Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework, improving accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning.
Qwen-Image-Agent proposes a unified agentic framework that addresses the context gap in text-to-image generation by integrating planning, reasoning, searching, and memory mechanisms. It introduces IA-Bench for evaluation and achieves state-of-the-art performance.
OmniPath is a multi-modal agentic framework that combines OpenStreetMap network topology with aerial LiDAR data to audit wheelchair accessibility by analyzing physical barriers like slope and surface discontinuities at high resolution, validated against field surveys.
Eve, a new agentic framework from Vercel, is being compared to 'Next.js for agents' for its file-based approach to tools, skills, and evals, enabling rapid agent building with TypeScript.
Introduces ToolGrad, an agentic framework that generates, evaluates, and refines tool-use trajectories using textual 'gradients', achieving near 100% pass rate and lower cost for dataset generation. Accepted at ACL 2026.
RL-Index proposes a reinforcement learning-based agentic indexing framework that shifts reasoning from query time to the indexing stage by augmenting documents with LLM-generated rationales, improving retrieval effectiveness and reducing online latency.
AlloSpatial is an agentic framework that enhances spatial reasoning in foundation models by converting egocentric observations into structured allocentric representations, using cognitive mapping and tool-use reasoning. It improves performance by 5-18% on benchmarks and outperforms larger models through cold-start reinforcement learning.
ProSPy is a profiling-driven SQL-Python agentic framework for enterprise text-to-SQL that structures reasoning into four stages: automatic profiling, schema pruning, dialect-agnostic SQL interface, and Python-based analysis. It achieves execution accuracies of 60.15% and 60.51% on Spider 2.0-Lite and Spider 2.0-Snow with Claude-4.5-Opus, outperforming strong baselines.
QueryAgent-R1 is an agentic framework that bridges query generation and product retrieval in e-commerce using reinforcement learning and memory abstraction, improving query CTR by 2.9% and CVR by 3.1% in online tests.
A new Google paper introduces LEAP, an agentic framework that enables general LLMs to solve formal math problems by planning proofs and checking each step, raising performance from under 10% to 70% on the Lean IMO benchmark and solving all 2025 Putnam problems.
Evo is an open-source tool that provides semi-autonomous agents to optimize codebases through parallel experimentation, using tree search and multiple subagents to autonomously discover and improve metrics.
LEAP is an agentic framework that enables general-purpose LLMs to achieve state-of-the-art performance in formal theorem proving in Lean, solving all 12 problems from the 2025 Putnam Competition and boosting formal solve rates from below 10% to 70% on a new benchmark (Lean-IMO-Bench), surpassing specialized systems.
MapAgent is an industrial-grade agentic framework that combines vision-language processing with constraint-aware reasoning to automatically produce specification-compliant lane-level maps, achieving over 95% automation in Baidu Maps for more than 360 cities.
MOSAIC introduces a structured agentic framework for automated data science that uses memory-grounded model selection and workflow construction, validated on financial time-series tasks. It outperforms AutoML and agentic baselines.
HypoAgent is an agentic framework for interactive abductive hypothesis generation over knowledge graphs, integrating three agents to handle evolving user intents and fine-grained diagnosis, achieving state-of-the-art performance.
ACO System is an open-source multi-agent framework that autonomously manages the software development pipeline from GitHub Issue to merged PR using six specialized AI agents, with a deterministic architect gate to prevent bad PRs.
This paper introduces RACE-Sched, an asynchronous agentic framework that decouples real-time reactive scheduling from deliberative LLM-based reasoning to handle dynamic job shop scheduling problems, achieving superior performance over DRL and other baselines.
Introduces Multi-Agent Residual In-Context Learning (MARICL), an agentic framework that uses LLM agents to analyze residuals from a base model on tabular data, hypothesize missing structure, and produce explicit correction terms via textual gradient optimization. Across nine benchmarks, MARICL consistently improves over its base model and demonstrates mechanistic generalization in cell-free protein predictions.
Research Math Agents (RMA) is an agentic framework for automated reasoning on research-level mathematical problems, achieving state-of-the-art results on the First Proof benchmark by solving 8 out of 10 problems, outperforming strong baselines like GPT-5.2R and Aletheia.