Articles from arXiv
This paper introduces RGAO, a retrieval-guided adaptive orchestration framework for multi-agent code generation that dynamically selects topology based on code complexity. It provides a formal budget algebra ensuring provable resource conservation while significantly reducing routing errors compared to baseline methods.
This paper introduces TGS-RAG, a bidirectional verification and completion framework that synergizes text-based and graph-based Retrieval-Augmented Generation to improve multi-hop reasoning accuracy.
The article introduces Prober.ai, a web-based writing environment that uses LLM-constrained personas to provide inquiry-based feedback for argumentative writing, aiming to prevent cognitive outsourcing. Developed as a hackathon prototype, the system gates revision suggestions behind student reflection to preserve critical thinking skills.
This paper proposes a causal framework for probing internal visual representations in Multimodal Large Language Models, revealing differences in how entities and abstract concepts are encoded. The study highlights that increasing model depth is crucial for encoding abstract concepts and uncovers a disconnect between perception and reasoning in current MLLMs.
This paper introduces BeliefMem, a novel memory paradigm for LLM agents that stores multiple candidate conclusions with probabilities to handle partial observability and reduce self-reinforcing errors. Empirical evaluations show it outperforms deterministic baselines on LoCoMo and ALFWorld benchmarks.
AlphaCrafter is a full-stack multi-agent framework for cross-sectional quantitative trading that uses specialized agents for factor mining, screening, and trading to adapt to evolving market conditions.
This paper proposes a locality-aware private class identification approach and a reliable optimal transport-based method (ReOT) to address domain adaptation challenges under extreme label shift, particularly distinguishing shared from private classes.
This paper introduces BitCal-TTS, a runtime controller that improves accuracy and reduces premature halting in quantized reasoning models by calibrating confidence signals during test-time scaling.
This position paper argues that AI agents are a production technology rather than a labor input, proposing a 'Compute-Anchored Wage' bound where human wages are determined by compute capital costs rather than labor supply elasticity.
This paper introduces SPARK, a self-play reinforcement learning framework that leverages knowledge graphs derived from scientific literature to improve relational reasoning in vision-language models.
This paper introduces AgenticRAG, a framework from Microsoft that enhances enterprise knowledge base retrieval by equipping LLMs with tools for iterative search, document navigation, and analysis. It demonstrates significant improvements in recall and factuality over standard RAG pipelines on multiple benchmarks.
This research introduces the Housing Potential Common Data Model (HPCDM) to integrate diverse datasets for housing analysis and demonstrates its application through a City Digital Twin pilot.
This paper introduces FoodCHA, a multi-modal LLM agent framework designed for fine-grained food analysis, addressing challenges in hierarchical consistency and attribute discrimination for dietary monitoring.
FinRAG-12B is a 12B-parameter LLM optimized for retrieval-augmented generation in banking, featuring a unified training framework that improves answer quality, citation grounding, and calibrated refusal. The model outperforms GPT-4.1 in citation grounding and is deployed across over 40 financial institutions with significant cost and latency advantages.
This paper introduces LANTERN, a framework for multi-source neurosymbolic transfer in reinforcement learning that uses LLMs to generate task automata and adaptive gating to improve sample efficiency.
This paper introduces the Functional Intentionality Test (FIT) and FIT-Eval framework to quantify the degree of intentional-like behavior in agentic AI systems for governance and accountability purposes.
This paper presents an agentic system using Large Language Models to automate the discovery of exchange-correlation functionals in Density Functional Theory, achieving improvements over human-designed baselines while highlighting challenges with benchmark overfitting.
This paper introduces 'authorization propagation' as a distinct security challenge in multi-agent AI systems, arguing that identity governance must be treated as infrastructure to maintain authorization invariants across autonomous agent interactions.
This paper introduces a Probabilistic Graphical Model framework to causally audit LLM safety mechanisms, revealing that standard observational metrics overestimate demographic bias by ignoring context toxicity.
This paper introduces 'constant-context skill learning,' a framework that moves procedural knowledge from prompts into model weights to reduce token usage and improve privacy for LLM agents. The method achieves strong performance on benchmarks like ALFWorld and WebShop while significantly reducing inference costs.