self-evolving-agents

#self-evolving-agents

@qingke_ai: https://x.com/qingke_ai/status/2076115489219297455

X AI KOLs Timeline ↗ · 2026-07-12 Cached

普林斯顿博士后Shilong Liu提出自进化代理的三层分类体系：工件迭代优化、Agent Harness自我改进和无黄金答案的模型学习，系统梳理了相关概念和前沿工作。

0 favorites 0 likes

#self-evolving-agents

The Blind Curator: How a Biased Judge Silently Disables Skill Retirement in Self-Evolving Agents

arXiv cs.CL ↗ · 2026-07-09 Cached

This paper investigates how a biased LLM judge silently disables skill retirement in self-evolving agents, showing that false-pass bias across a sharp threshold prevents contribution-based retirement and that the failure is universal across domains, detectable only through a defect-injection audit.

0 favorites 0 likes

#self-evolving-agents

A Taxonomy of Self-evolving Agents (15 minute read)

TLDR AI ↗ · 2026-07-09 Cached

Shilong Liu proposes a taxonomy classifying self-evolving agents into artifact optimization, harness self-improvement, and model learning, providing a common language for emerging agent research.

0 favorites 0 likes

#self-evolving-agents

@rohanpaul_ai: Great paper on Self-evolving agents. Enterprise agents cannot truly improve until their messy daily work becomes safe l…

X AI KOLs Timeline ↗ · 2026-07-03 Cached

A paper proposing a mechanism for enterprise agents to improve by safely converting messy daily work into learning data, using a data proxy and control layer, with AREAL2.0 demonstrating online RL from real interaction traces.

0 favorites 0 likes

#self-evolving-agents

Self-Evolving Agents with Anytime-Valid Certificates

arXiv cs.AI ↗ · 2026-07-02 Cached

This paper introduces SEA, an architecture for self-evolving agents that confines self-modification to a steering adapter and versioned harness around a frozen base model, using anytime-valid gates to audit modifications against a fixed error budget. Experiments on SWE-bench Verified with four base models show that the suite provides a +4 to +5% improvement on strong base models while preventing regressions.

0 favorites 0 likes

#self-evolving-agents

Metis: Bridging Text and Code Memory for Self-Evolving Agents

arXiv cs.CL ↗ · 2026-06-24 Cached

Metis presents a controlled study comparing text and code memory for self-evolving agents, finding they have complementary trade-offs. It proposes a hierarchical dual-representation memory system that improves task accuracy by up to 20.6% and reduces execution cost by up to 22.8% on the AppWorld benchmark.

0 favorites 0 likes

#self-evolving-agents

SEAGym: An Evaluation Environment for Self-Evolving LLM Agents

arXiv cs.AI ↗ · 2026-06-17 Cached

SEAGym is a new evaluation environment for self-evolving LLM agents that measures agent harness updates across training, validation, test, replay, and cost records, providing complementary signals about the evolution process.

0 favorites 0 likes

#self-evolving-agents

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

OPD-Evolver proposes a self-evolving agent framework using slow-fast co-evolution and on-policy self-distillation to enhance memory management and policy learning, outperforming existing methods like ReasoningBank and Skill0 across multi-domain benchmarks.

0 favorites 0 likes

#self-evolving-agents

@qinzytech: https://x.com/qinzytech/status/2066585405479371092

X AI KOLs Timeline ↗ · 2026-06-15 Cached

A technical analysis of two approaches to building self-evolving AI agents: model-based (via architecture like SSMs or transformer with fast-weight updates, and training methods) and harness-based (via memory or meta harness that can rewrite itself). The author provides practical recommendations for different audiences.

0 favorites 0 likes

#self-evolving-agents

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

arXiv cs.AI ↗ · 2026-06-09 Cached

PACE introduces an anytime-valid commit gate for self-evolving agents that replaces greedy acceptance with a sequential hypothesis test, controlling false-commit probability and reducing churn while matching performance with lower variance.

0 favorites 0 likes

#self-evolving-agents

Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments

arXiv cs.CL ↗ · 2026-06-08 Cached

This paper introduces FinEvolveBench, a benchmark for financial sentiment prediction, and Tree-of-Experience (ToE), a structured experience-management method for LLM agents in low-repetition tasks with implicit rewards. Experiments show that ToE outperforms general-purpose experience mechanisms in such challenging settings.

0 favorites 0 likes

#self-evolving-agents

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

Socratic-SWE introduces a closed-loop self-evolution framework for software engineering agents that leverages historical solving traces to generate targeted repair tasks, achieving 50.40% on SWE-bench Verified after three iterations.

0 favorites 0 likes

#self-evolving-agents

Scaling Self-Evolving Agents via Parametric Memory

arXiv cs.AI ↗ · 2026-06-04 Cached

Researchers from Alibaba/Qwen and Peking University introduce TMEM, a self-evolving parametric memory framework that uses online LoRA weight updates to let LLM agents genuinely learn from experience within a single episode, rather than relying solely on prompt-space memory. TMEM outperforms summary-based and retrieval-based baselines across multiple benchmarks including LoCoMo, LongMemEval-S, and CL-Bench.

0 favorites 0 likes

#self-evolving-agents

OpenSkill: Open-World Self-Evolution for LLM Agents

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

OpenSkill is a framework for LLM agents to self-evolve skills and verification signals from open-world resources without target-task supervision, achieving high performance across benchmarks.

0 favorites 0 likes

#self-evolving-agents

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

arXiv cs.AI ↗ · 2026-05-25 Cached

EVE-Agent introduces a framework for self-evolving search agents that ensure evidence verifiability by generating questions, answers, and evidence spans, and training on marginal accuracy gain of evidence. This improves grounded correctness without human annotations.

0 favorites 0 likes

#self-evolving-agents

Rethinking Experience Utilization in Self-Evolving Language Model Agents

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper introduces ExpWeaver, a framework that optimizes how self-evolving language model agents utilize past experiences during runtime decision-making. It demonstrates that selectively invoking experience based on reasoning uncertainty improves performance across various environments and models.

0 favorites 0 likes

#self-evolving-agents

Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents

arXiv cs.AI ↗ · 2026-05-08 Cached

This paper introduces a method using knowledge-graph paths as intermediate supervision to improve self-evolving search agents. It addresses bottlenecks in Search Self-Play by grounding question construction in relational context and introducing a Waypoint Coverage Reward for graded partial credit.

0 favorites 0 likes

#self-evolving-agents

SkillOS: Learning Skill Curation for Self-Evolving Agents

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.

0 favorites 0 likes

#self-evolving-agents

On Safety Risks in Experience-Driven Self-Evolving Agents

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from Harbin Institute of Technology and Singapore Management University investigate safety risks in experience-driven self-evolving LLM agents, finding that even benign task experience can compromise safety in high-risk scenarios due to agents' execution-oriented tendencies, and revealing a fundamental safety–utility trade-off.

0 favorites 0 likes

self-evolving-agents

Submit Feedback