OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Hugging Face Daily Papers 06/16/26, 12:00 AM Papers

Summary

OPD-Evolver proposes a self-evolving agent framework using slow-fast co-evolution and on-policy self-distillation to enhance memory management and policy learning, outperforming existing methods like ReasoningBank and Skill0 across multi-domain benchmarks.

Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write reusable knowledge, and maintain a growing repository. We introduce OPD-Evolver, a slow-fast co-evolution framework that cultivates such an agent evolver through on-policy self-distillation. In the fast loop, OPD-Evolver interacts with a four-level memory hierarchy to read, use, write, and maintain experience for rapid test-time evolution. In the slow loop, outcome-calibrated memory attribution and privileged hindsight distill these four abilities into the deployable policy. Across multi-domain benchmarks, OPD-Evolver surpasses memory systems such as ReasoningBank by up to 11.5%, and training-based methods such as Skill0 by ~5.8%. Further analysis shows that OPD-Evolver internalizes high-value experience and memory management, enabling OPD-Evolver-9B to challenge giant counterparts such as Qwen3.5-397B-A17B and Step-3.5-Flash, pointing beyond memory-augmented agents toward genuinely qualified agent evolvers.

Original Article

View Cached Full Text

Cached at: 06/17/26, 03:35 AM

Paper page - OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Source: https://huggingface.co/papers/2606.17628

Abstract

OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains.

Memory has become a standard substrate forself-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write reusable knowledge, and maintain a growing repository. We introduce OPD-Evolver, aslow-fast co-evolutionframework that cultivates such anagent evolverthroughon-policy self-distillation. In the fast loop, OPD-Evolver interacts with a four-levelmemory hierarchyto read, use, write, and maintain experience for rapid test-time evolution. In the slow loop, outcome-calibrated memory attribution and privileged hindsight distill these four abilities into the deployable policy. Across multi-domain benchmarks, OPD-Evolver surpasses memory systems such as ReasoningBank by up to 11.5%, and training-based methods such as Skill0 by ~5.8%. Further analysis shows that OPD-Evolver internalizes high-value experience andmemory management, enabling OPD-Evolver-9B to challenge giant counterparts such as Qwen3.5-397B-A17B and Step-3.5-Flash, pointing beyond memory-augmented agents toward genuinely qualifiedagent evolvers.

View arXiv page View PDF GitHub0 Add to collection

Models citing this paper1

#### greeky/OPDEvolver Updatedabout 1 hour ago

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.17628 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.17628 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Paper page - OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

OPRD: On-Policy Representation Distillation

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

Submit Feedback

Similar Articles

OPRD: On-Policy Representation Distillation

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models