Tag
This paper presents an agentic framework that uses coding agents to generate physically plausible world simulations from natural language prompts, outperforming video-based models in physical accuracy and instruction fidelity.
GraphBit is a graph-based agentic framework that uses deterministic DAG orchestration with a Rust engine to eliminate hallucinations and infinite loops. It achieves 67.6% accuracy on GAIA benchmarks with zero framework-induced errors and low latency.
Proposes an agentic framework using LangChain agents for population-scale mental health screening, focusing on depression detection from clinical transcripts. The framework incrementally locks validated stages and uses proxy-guided evaluation to ensure trustworthiness and adaptability.
Nexus introduces a multi-agent framework that decomposes time series forecasting into specialized stages, integrating numerical patterns and contextual information using LLMs, achieving state-of-the-art results on benchmarks.
Physics-intern is an agentic framework for theoretical physics that improves Gemini 3.1 Pro's performance on the CritPt benchmark from 17.7% to 31.4%, achieving a new state-of-the-art.
PresentAgent-2 is an agentic framework that generates presentation videos from user queries by conducting research, creating multimodal slides, and producing interactive content across single, discussion, and interaction modes.
This paper introduces AutoLLMResearch, an agentic framework that automates the configuration of expensive LLM experiments by learning from low-fidelity environments and extrapolating to high-cost settings. It aims to reduce computational waste and reliance on expert intuition in scalable LLM research.
This paper introduces FoodCHA, a multi-modal LLM agent framework designed for fine-grained food analysis, addressing challenges in hierarchical consistency and attribute discrimination for dietary monitoring.
Chat2Workflow introduces a benchmark and agentic framework for generating executable visual workflows from natural language, showing current LLMs struggle with industrial-grade automation despite intent capture.
This paper introduces Discover and Prove (DAP), an open-source agentic framework for automated theorem proving in Lean 4 that tackles 'Hard Mode' problems where the answer must be discovered independently before formal proof construction. The work releases new Hard Mode benchmark variants and achieves state-of-the-art results while revealing a significant gap between LLM answer accuracy (>80%) and formal prover success (<10%).
MM-WebAgent is a hierarchical agentic framework that generates coherent and visually consistent webpages by coordinating AIGC-based element generation through joint optimization of layout and multimodal content. The paper introduces a benchmark and multi-level evaluation protocol, demonstrating improvements over code-generation and agent-based baselines.