Articles from arXiv
This paper identifies a capacity-induced failure mode in physics-informed neural networks (PINNs) where overparameterized networks develop functional modularity that hinders convergence, and proposes Modular-Sparsity Synchronization (ModSync), a framework that penalizes task-exclusive connections to maintain cross-objective interaction and achieve state-of-the-art accuracy.
BIM-Edit is a benchmark for evaluating LLMs on natural-language editing of Building Information Models (BIM) in IFC format. Results show a substantial gap, with the best model achieving only 49.5% average score across geometric, semantic, and topological metrics.
Introduces RACL, a reasoning-agent control layer that improves metaheuristic optimization by learning to control internal search behavior from operational memory, showing cost improvements in vehicle routing tests.
This paper proposes an adaptive, subject-aware prompt routing framework for LLM-based high-school tutoring, using 14 pedagogical features to switch strategies. A/B testing with 359 students shows improved efficiency and conversion rates over static baselines.
ScaffoldAgent introduces a utility-guided dynamic outline optimization framework for open-ended deep research, using expansion, contraction, and revision operations to improve long-form report generation and factual grounding.
This paper proposes a novel architecture integrating multi-head attention with the Soft Actor-Critic algorithm for porosity prediction and process parameter optimization in additive manufacturing, achieving faster convergence and higher rewards than standard RL methods.
Introduces a framework combining flow-based generative editing with evolutionary algorithms to perform optimization in residual space, enabling controllable data editing with non-differentiable objectives. Validated on MorphoMNIST and crystal data.
This paper presents Process-Verified Reinforcement Learning, using the Lean proof assistant as a process oracle to provide fine-grained tactic-level feedback during training, improving theorem proving performance.
This paper evaluates multi-agent orchestration architectures (DAG Plan and Execute, ReAct) at enterprise scales and introduces a Task Manager for continuous event-driven operation, showing improvements in latency and correctness.
This paper introduces Reward as an Agent and DynDiff-GRPO to address reward hacking and limited exploration in reinforcement learning for embodied world models, achieving significant accuracy gains.
This paper proposes an automatic generation pipeline to create a large-scale training dataset (RAINbow) for DialNav, a dialog-based vision-and-language navigation task. Combined with dual-strategy training and a localization model, it achieves substantial gains over the baseline.
This paper identifies an embodiment gap in humanoid co-speech motion generation caused by human-centric pipelines, and proposes PhysDrift, an embodiment-aware framework that directly predicts executable humanoid joint trajectories from speech, improving speech-motion alignment and physical plausibility.
This paper explores autotelic AI, where agents generate their own goals, and discusses implications for intrinsic motivation, embeddedness, and the dissolution of the self boundary. It proposes a framework extending to quantum formulation, non-dual philosophy, and LLM-based instantiation.
This paper proposes eCNNTO, a CNN with residual connections to accelerate density-based topology optimization by predicting near-optimal densities from early iteration histories, achieving up to 97% reduction in iterations and strong generalization across different boundary conditions, geometries, and mesh resolutions.
Proposes Multi-Agent Transactive Memory (MATM), a framework for population-level storage and retrieval of agent-generated trajectories to improve task performance and reduce interaction steps in interactive environments like ALFWorld and WebArena.
MetaResearcher proposes a framework for training deep research agents using self-reflective reinforcement learning in adversarial virtual environments, addressing limitations of static environments and fact-retrieval-only tasks.
This paper presents a systematic review and benchmark of 24 black-box uncertainty estimation methods for large language models across 4 models and 4 dataset settings, finding that no single method dominates but hybrid methods that combine multiple uncertainty signals perform well.
TelcoAgent is a foundation model-based framework for scalable and explainable multi-KPM forecasting in 5G networks, using automated 3GPP knowledge graph construction and a time-series foundation model for zero-shot prediction.
This paper proposes a human-on-the-loop orchestration framework for AI-assisted legal discovery, introducing a taxonomy of agentic failures and a four-layer verification architecture to reduce privilege-waiver risk.
CombEval is a dynamic benchmark for evaluating combinatorial counting in large language models, using typed specifications to generate problems with solver-verified answers. It tests 11 LLMs under direct and code-augmented settings and finds brittleness on ordered objects, indistinguishable elements, relative constraints, and nested dependencies.