arxiv-preprint

Tag

Cards List
#arxiv-preprint

Getting good predictions without data cleaning (Why "Garbage In, Garbage Out" is sometimes a trap)

Reddit r/artificial · 2026-05-13

This arXiv preprint challenges the 'Garbage In, Garbage Out' heuristic, arguing that aggressive manual data cleaning can limit predictive performance in high-dimensional tabular data by reducing dimensionality needed to triangulate latent drivers.

0 favorites 0 likes
#arxiv-preprint

Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents

arXiv cs.AI · 2026-05-12 Cached

This paper introduces Vigil, an evaluation framework for embodied agents that disentangles task execution success from the agent's ability to correctly recognize and report task completion.

0 favorites 0 likes
#arxiv-preprint

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

arXiv cs.AI · 2026-05-12 Cached

This paper introduces a critique-and-routing controller for multi-agent LLM systems that formulates coordination as a sequential decision problem. It uses policy gradients to optimize the controller for iterative refinement, outperforming baselines while reducing reliance on top-tier models.

0 favorites 0 likes
#arxiv-preprint

Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare

arXiv cs.AI · 2026-05-12 Cached

This paper presents a structured framework for benchmarking generative, multimodal, and agentic AI in healthcare, addressing the gap between high benchmark scores and real-world clinical reliability, safety, and relevance.

0 favorites 0 likes
#arxiv-preprint

Path-Based Gradient Boosting for Graph-Level Prediction

arXiv cs.LG · 2026-05-12 Cached

This paper introduces PathBoost, a gradient tree boosting method for graph-level prediction that uses path-based features to compete with graph neural networks while offering better interpretability.

0 favorites 0 likes
#arxiv-preprint

EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation

arXiv cs.AI · 2026-05-11 Cached

This paper introduces EnvSimBench, a benchmark for evaluating Large Language Models' ability to simulate environments for agent training. It identifies a 'state change cliff' in current LLMs and proposes a constraint-driven pipeline to reduce hallucinations and costs.

0 favorites 0 likes
#arxiv-preprint

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

arXiv cs.AI · 2026-05-11 Cached

This paper proposes AGWM, an affordance-grounded world model that uses a dynamic prerequisite graph to track action executability in environments with compositional prerequisites. Experiments show it reduces prediction error and improves generalization compared to standard world models.

0 favorites 0 likes
#arxiv-preprint

Belief Memory: Agent Memory Under Partial Observability

arXiv cs.AI · 2026-05-08 Cached

This paper introduces BeliefMem, a novel memory paradigm for LLM agents that stores multiple candidate conclusions with probabilities to handle partial observability and reduce self-reinforcing errors. Empirical evaluations show it outperforms deterministic baselines on LoCoMo and ALFWorld benchmarks.

0 favorites 0 likes
#arxiv-preprint

Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning

arXiv cs.LG · 2026-05-08 Cached

This paper introduces Adaptive Q-Chunking (AQC), a reinforcement learning method that dynamically selects action chunk sizes to balance reactive control and long-horizon planning. It achieves state-of-the-art results on OGBench and Robomimic, enhancing the performance of large-scale VLA models in robotics tasks.

0 favorites 0 likes
#arxiv-preprint

Information Theoretic Adversarial Training of Large Language Models

arXiv cs.LG · 2026-05-08 Cached

This paper introduces WARDEN, a distributionally robust adversarial training framework for large language models that uses f-divergence to dynamically reweight adversarial examples, significantly reducing attack success rates while maintaining computational efficiency.

0 favorites 0 likes
← Back to home

Submit Feedback