deepseek-v3

#deepseek-v3

Not All Skills Help: Measuring and Repairing Agent Knowledge

arXiv cs.CL ↗ · yesterday Cached

This paper identifies that naive skill accumulation in LLM agents can cause performance regressions, as skills beneficial for some tasks hurt others. The authors propose Assay, a framework that measures per-skill causal contributions and applies per-task masking, achieving state-of-the-art results on AppWorld and τ-bench without weight updates.

0 favorites 0 likes

deepseek-v3

Not All Skills Help: Measuring and Repairing Agent Knowledge

Submit Feedback