gpt-4.1

#gpt-4.1

Not All Skills Help: Measuring and Repairing Agent Knowledge

arXiv cs.CL ↗ · yesterday Cached

This paper identifies that naive skill accumulation in LLM agents can cause performance regressions, as skills beneficial for some tasks hurt others. The authors propose Assay, a framework that measures per-skill causal contributions and applies per-task masking, achieving state-of-the-art results on AppWorld and τ-bench without weight updates.

0 favorites 0 likes

#gpt-4.1

GPT Guesses Between 1 and 100

Hacker News Top ↗ · 2026-05-25 Cached

This paper presents an experiment where GPT-4.1 is asked to pick a random number between 1 and 100, 10,000 times, and the resulting distribution is analyzed for bias compared to a uniform baseline.

0 favorites 0 likes

gpt-4.1

Not All Skills Help: Measuring and Repairing Agent Knowledge

GPT Guesses Between 1 and 100

Submit Feedback