appworld

#appworld

Not All Skills Help: Measuring and Repairing Agent Knowledge

arXiv cs.CL ↗ · yesterday Cached

This paper identifies that naive skill accumulation in LLM agents can cause performance regressions, as skills beneficial for some tasks hurt others. The authors propose Assay, a framework that measures per-skill causal contributions and applies per-task masking, achieving state-of-the-art results on AppWorld and τ-bench without weight updates.

0 favorites 0 likes

#appworld

MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

arXiv cs.AI ↗ · 2026-05-12 Cached

MIND-Skill is a new framework introduced in this research paper that automates the generation of high-quality, reusable agent skills using multi-agent induction and deduction with quality guarantees via TextGrad optimization.

0 favorites 0 likes

#appworld

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces HCL-GP, a dynamic policy-learning framework that integrates generalized planning and hierarchical task decomposition to enable LLM-based agents to learn and reuse executable policy components, significantly improving performance on the AppWorld benchmark.

0 favorites 0 likes

appworld

Not All Skills Help: Measuring and Repairing Agent Knowledge

MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

Submit Feedback