Tag
ORACLE is a new agentic framework for early scam anticipation from streaming app usage trajectories. It uses a self-evolving context manager and on-policy self-distillation to detect scams from partial observations over multiple apps and days.
ClinSeekAgent is an automated agentic framework that enables large language models to actively acquire and synthesize multimodal clinical evidence from raw data sources, improving decision-making accuracy in both text-only and multimodal tasks. It introduces the ClinSeek-Bench benchmark and a distilled model ClinSeek-35B-A3B that achieves strong performance on agentic clinical reasoning.
RTL-BenchMT is an agentic framework that automatically identifies and revises flawed cases and detects overfitting in RTL generation benchmarks, reducing human maintenance effort in EDA research.
Lean Refactor presents a retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs, achieving significant compression and compilation-time reduction.
A novel MLLM-based agentic framework called Code-as-Room generates 3D indoor rooms by converting top-down images into executable Blender code through a structured execution harness with cross-stage memory to maintain context.
Google's Nexus paper proposes an agentic framework that incorporates contextual events alongside numerical data for time series forecasting, achieving an 86.6% MAPE reduction on Zillow tests compared to direct chain-of-thought prompting.
This paper presents an agentic framework that uses coding agents to generate physically plausible world simulations from natural language prompts, outperforming video-based models in physical accuracy and instruction fidelity.
GraphBit is a graph-based agentic framework that uses deterministic DAG orchestration with a Rust engine to eliminate hallucinations and infinite loops. It achieves 67.6% accuracy on GAIA benchmarks with zero framework-induced errors and low latency.
Proposes an agentic framework using LangChain agents for population-scale mental health screening, focusing on depression detection from clinical transcripts. The framework incrementally locks validated stages and uses proxy-guided evaluation to ensure trustworthiness and adaptability.
Nexus introduces a multi-agent framework that decomposes time series forecasting into specialized stages, integrating numerical patterns and contextual information using LLMs, achieving state-of-the-art results on benchmarks.
Physics-intern is an agentic framework for theoretical physics that improves Gemini 3.1 Pro's performance on the CritPt benchmark from 17.7% to 31.4%, achieving a new state-of-the-art.
PresentAgent-2 is an agentic framework that generates presentation videos from user queries by conducting research, creating multimodal slides, and producing interactive content across single, discussion, and interaction modes.
This paper introduces AutoLLMResearch, an agentic framework that automates the configuration of expensive LLM experiments by learning from low-fidelity environments and extrapolating to high-cost settings. It aims to reduce computational waste and reliance on expert intuition in scalable LLM research.
This paper introduces FoodCHA, a multi-modal LLM agent framework designed for fine-grained food analysis, addressing challenges in hierarchical consistency and attribute discrimination for dietary monitoring.
Chat2Workflow introduces a benchmark and agentic framework for generating executable visual workflows from natural language, showing current LLMs struggle with industrial-grade automation despite intent capture.
This paper introduces Discover and Prove (DAP), an open-source agentic framework for automated theorem proving in Lean 4 that tackles 'Hard Mode' problems where the answer must be discovered independently before formal proof construction. The work releases new Hard Mode benchmark variants and achieves state-of-the-art results while revealing a significant gap between LLM answer accuracy (>80%) and formal prover success (<10%).
MM-WebAgent is a hierarchical agentic framework that generates coherent and visually consistent webpages by coordinating AIGC-based element generation through joint optimization of layout and multimodal content. The paper introduces a benchmark and multi-level evaluation protocol, demonstrating improvements over code-generation and agent-based baselines.