arXiv

Articles from arXiv

Cards List

Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation

arXiv cs.AI · yesterday Cached

This paper introduces RGAO, a retrieval-guided adaptive orchestration framework for multi-agent code generation that dynamically selects topology based on code complexity. It provides a formal budget algebra ensuring provable resource conservation while significantly reducing routing errors compared to baseline methods.

0 favorites 0 likes

Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG

arXiv cs.AI · yesterday Cached

This paper introduces TGS-RAG, a bidirectional verification and completion framework that synergizes text-based and graph-based Retrieval-Augmented Generation to improve multi-hop reasoning accuracy.

0 favorites 0 likes

Prober.ai: Gated Inquiry-Based Feedback via LLM-Constrained Personas for Argumentative Writing Development

arXiv cs.AI · yesterday Cached

The article introduces Prober.ai, a web-based writing environment that uses LLM-constrained personas to provide inquiry-based feedback for argumentative writing, aiming to prevent cognitive outsourcing. Developed as a hackathon prototype, the system gates revision suggestions behind student reflection to preserve critical thinking skills.

0 favorites 0 likes

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

arXiv cs.AI · yesterday Cached

This paper proposes a causal framework for probing internal visual representations in Multimodal Large Language Models, revealing differences in how entities and abstract concepts are encoded. The study highlights that increasing model depth is crucial for encoding abstract concepts and uncovers a disconnect between perception and reasoning in current MLLMs.

0 favorites 0 likes

Belief Memory: Agent Memory Under Partial Observability

arXiv cs.AI · yesterday Cached

This paper introduces BeliefMem, a novel memory paradigm for LLM agents that stores multiple candidate conclusions with probabilities to handle partial observability and reduce self-reinforcing errors. Empirical evaluations show it outperforms deterministic baselines on LoCoMo and ALFWorld benchmarks.

0 favorites 0 likes

AlphaCrafter: A Full-Stack Multi-Agent Framework for Cross-Sectional Quantitative Trading

arXiv cs.AI · yesterday Cached

AlphaCrafter is a full-stack multi-agent framework for cross-sectional quantitative trading that uses specialized agents for factor mining, screening, and trading to adapt to evolving market conditions.

0 favorites 0 likes

Locality-aware Private Class Identification for Domain Adaptation with Extreme Label Shift

arXiv cs.AI · yesterday Cached

This paper proposes a locality-aware private class identification approach and a reliable optimal transport-based method (ReOT) to address domain adaptation challenges under extreme label shift, particularly distinguishing shared from private classes.

0 favorites 0 likes

BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

arXiv cs.AI · yesterday Cached

This paper introduces BitCal-TTS, a runtime controller that improves accuracy and reduces premature halting in quantized reasoning models by calibrating confidence signals during test-time scaling.

0 favorites 0 likes

Who Prices Cognitive Labor in the Age of Agents? A Position on Compute-Anchored Wages

arXiv cs.AI · yesterday Cached

This position paper argues that AI agents are a production technology rather than a labor input, proposing a 'Compute-Anchored Wage' bound where human wages are determined by compute capital costs rather than labor supply elasticity.

0 favorites 0 likes

SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs

arXiv cs.AI · yesterday Cached

This paper introduces SPARK, a self-play reinforcement learning framework that leverages knowledge graphs derived from scientific literature to improve relational reasoning in vision-language models.

0 favorites 0 likes

AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases

arXiv cs.AI · yesterday Cached

This paper introduces AgenticRAG, a framework from Microsoft that enhances enterprise knowledge base retrieval by equipping LLMs with tools for iterative search, document navigation, and analysis. It demonstrates significant improvements in recall and factuality over standard RAG pipelines on multiple benchmarks.

0 favorites 0 likes

Housing Potential Common Data Model and City Digital Twin

arXiv cs.AI · yesterday Cached

This research introduces the Housing Potential Common Data Model (HPCDM) to integrate diverse datasets for housing analysis and demonstrates its application through a City Digital Twin pilot.

0 favorites 0 likes

FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis

arXiv cs.AI · yesterday Cached

This paper introduces FoodCHA, a multi-modal LLM agent framework designed for fine-grained food analysis, addressing challenges in hierarchical consistency and attribute discrimination for dietary monitoring.

0 favorites 0 likes

FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking

arXiv cs.AI · yesterday Cached

FinRAG-12B is a 12B-parameter LLM optimized for retrieval-augmented generation in banking, featuring a unified training framework that improves answer quality, citation grounding, and calibrated refusal. The model outperforms GPT-4.1 in citation grounding and is deployed across over 40 financial institutions with significant cost and latency advantages.

0 favorites 0 likes

LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks

arXiv cs.AI · yesterday Cached

This paper introduces LANTERN, a framework for multi-source neurosymbolic transfer in reinforcement learning that uses LLMs to generate task automata and adaptive gating to improve sample efficiency.

0 favorites 0 likes

Intentionality is a Design Decision: Measuring Functional Intentionality for Accountable AI Systems

arXiv cs.AI · yesterday Cached

This paper introduces the Functional Intentionality Test (FIT) and FIT-Eval framework to quantify the degree of intentional-like behavior in agentic AI systems for governance and accountability purposes.

0 favorites 0 likes

Agentic Discovery of Exchange-Correlation Density Functionals

arXiv cs.AI · yesterday Cached

This paper presents an agentic system using Large Language Models to automate the discovery of exchange-correlation functionals in Density Functional Theory, achieving improvements over human-designed baselines while highlighting challenges with benchmark overfitting.

0 favorites 0 likes

Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure

arXiv cs.AI · yesterday Cached

This paper introduces 'authorization propagation' as a distinct security challenge in multi-agent AI systems, arguing that identity governance must be treated as infrastructure to maintain authorization invariants across autonomous agent interactions.

0 favorites 0 likes

The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias

arXiv cs.AI · yesterday Cached

This paper introduces a Probabilistic Graphical Model framework to causally audit LLM safety mechanisms, revealing that standard observational metrics overestimate demographic bias by ignoring context toxicity.

0 favorites 0 likes

From History to State: Constant-Context Skill Learning for LLM Agents

arXiv cs.AI · yesterday Cached

This paper introduces 'constant-context skill learning,' a framework that moves procedural knowledge from prompts into model weights to reduce token usage and improve privacy for LLM agents. The method achieves strong performance on benchmarks like ALFWorld and WebShop while significantly reducing inference costs.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback