capability-analysis

Tag

Cards List
#capability-analysis

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

arXiv cs.AI · 2026-06-01 Cached

This paper analyzes two capabilities in self-evolving LLM agents: harness-updating and harness-benefit. It finds that harness-updating is flat across base capability levels, while harness-benefit is non-monotonic, with mid-tier models benefiting most.

0 favorites 0 likes
#capability-analysis

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

arXiv cs.LG · 2026-05-20

This paper introduces a population coupling trend and h-field diagnostic to analyze the relationship between coding and reasoning capabilities across frontier AI models, finding that capabilities cooperate but with varying emphasis per lab. It provides a playbook for measurement and predicts benchmark saturation trends.

0 favorites 0 likes
← Back to home

Submit Feedback