Tag
ASALT introduces adaptive state and observation-level adapters for lateral transfer in multi-agent reinforcement learning, enabling effective strategy transfer between domains with mismatched state-space dimensionalities and reducing negative transfer.
The paper introduces CALIBER, a method for calibrating confidence in reasoning language models by eliciting confidence estimates both before and after reasoning, with supervision targets matched to the information state. It achieves significant reductions in Expected Calibration Error (up to 52.5%) and strong Brier scores and AUROC across multiple benchmarks.
This paper introduces causal reinforcement learning (CRL), unifying causal inference and reinforcement learning under a structural causal model framework, and explores novel learning settings such as generalized policy learning and counterfactual learning.
Announces an arXiv note on a mathematical symmetry connecting classic MLP to Gated MLP, going beyond empirical performance.
Two recent arXiv papers found that GPT-5.4 and Claude Opus 4.6 employ a metaprogramming strategy when handling unfamiliar programming languages — generating target code with Python and debugging locally — rather than writing the target language code directly. This strategy is key to distinguishing top-tier agents from average ones, and strategy sophistication matters more than model parameter scale.
This paper proposes a thermodynamic measure of intelligence defined as 'rare-valid lift' and argues that recursive self-simulation is necessary and nearly sufficient for high thermodynamic intelligence, making intelligence measurable on a universal scale.
A local-first academic paper management desktop application linXiv, supporting paper discovery, management, and visualization from sources like arXiv, integrating SQLite database, AI annotation, Obsidian notes, and paper network graph.
autoarxiv lets you turn any arxiv paper into running code by simply changing the URL to autoarxiv.org. An AI agent from alphaXiv reads the paper, clones the repo, sets up dependencies, and runs a minimal reproduction to verify claims, logging everything live.
This paper evaluates multi-agent orchestration architectures (DAG Plan and Execute, ReAct) at enterprise scales and introduces a Task Manager for continuous event-driven operation, showing improvements in latency and correctness.
This paper identifies an embodiment gap in humanoid co-speech motion generation caused by human-centric pipelines, and proposes PhysDrift, an embodiment-aware framework that directly predicts executable humanoid joint trajectories from speech, improving speech-motion alignment and physical plausibility.
TelcoAgent is a foundation model-based framework for scalable and explainable multi-KPM forecasting in 5G networks, using automated 3GPP knowledge graph construction and a time-series foundation model for zero-shot prediction.
This paper proposes a segment combination strategy for automatically classifying research methods in academic papers by partitioning full-text content. Experiments on an annotated corpus from Library and Information Science journals show that methodological information is unevenly distributed, with middle-to-late segments having higher discriminative power.
This paper introduces RPCL, a training-only framework for robust pair confidence learning in multimodal emotion-cause pair extraction, which improves discriminative separation of gold pairs from hard negatives and achieves significant gains in Pair F1 and AUPRC on three datasets.
VisualSkill proposes a hierarchical multimodal skill library for computer-use agents that combines text and figures, achieving a 15.3 point absolute lift on CUA benchmarks over text-only baselines by retaining visual information for GUI interaction.
This paper systematically evaluates assumptions about LLM persona prompting and identifies 'persona manifold collapse,' where richer persona descriptions reduce behavioral diversity and simulation fidelity. The findings show that simple age-gender personas often outperform more detailed profiles.
QSignAI is a production-deployed open-source platform that combines quantum randomness from a Toeplitz two-source extractor with an AI bot on Telegram to generate unique identity signatures, demonstrating a bidirectional relationship between artificial intelligence and quantum science.
Proposes the Human-AI Coevolution Dynamics Framework (HACD-H) as a formal model of human-AI interaction, integrating emotional adaptation, relational organization, social memory, and personality consistency. Results show social intelligence emerges from long-term social cognitive coevolution.
Firecrawl released a state-of-the-art research index for AI/ML papers, claiming 18% better recall on arXivQA than competitors, designed for autonomous research agents.
MM++ is a fully unsupervised, post-hoc framework for out-of-distribution detection that fuses discriminative intermediate layers via top-K gated feature fusion and uses a regularized tied covariance matrix for scale-invariant distance estimation.
This paper introduces the Valid-Answer-Invalid-Reasoning (VAIR) benchmark to expose the production-evaluation gap in AI reasoning models, where models can generate correct answers but fail to detect flawed reasoning, revealing answer confirmation bias.