Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory
Summary
This paper introduces SkeMex, a self-evolving framework that enhances medical agents by distilling interaction trajectories into structured skill memory, enabling better long-term clinical reasoning through context-dependent utility estimation and governance.
View Cached Full Text
Cached at: 06/09/26, 08:41 AM
Paper page - Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory
Source: https://huggingface.co/papers/2606.09365 Published on Jun 8
·
Submitted byhttps://huggingface.co/manglu3935
Mangluon Jun 9
Authors:
,
,
,
,
,
,
,
,
,
Abstract
SkeMex is a self-evolving framework that enhances medical agents through structured skill memory, improving long-term clinical reasoning by distinguishing useful experiences and governing memory retention based on contextual utility.
Medical agent systemsare increasingly expected to supportinteractive clinical decision makingrather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existingmemory mechanismsoften retain raw historical traces that are redundant, noisy, and difficult to govern. More importantly, they rarely distinguish which memories are truly useful for future reasoning. This limits their ability to accumulate compact and reliable experience for long-horizon clinical reasoning. To close this gap, we propose SkeMex, apost-deployment self-evolutionframework that improves medical agents through askill-based memorywithout updating model weights. SkeMex distills informativeinteraction trajectoriesinto structured skills that encode reusableprocedural knowledge, and organizes them into amulti-branch repositoryspanning general, task-specific, and action-level experience. To determine which memories should be reused and retained, SkeMex estimatescontext-dependent utilityfrom environment feedback and uses it to guidevalue-aware retrievalandrepository governance. A closed-loop ``Read--Write--Assess--Govern“ lifecycle further supportscontinual evolutionby writing new skills, updating utilities, promoting useful memories, and removing harmful entries. Experiments across diverse clinical tasks show that SkeMex consistently outperforms representative memory-based agents in bothoffline and online settings. It also generalizes across model backbones and supportstransferable skill memory. All data and code will be released publicly.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.09365
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.09365 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.09365 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.09365 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills
This paper introduces MedSkillAudit, a domain-specific framework for auditing the safety and quality of medical research AI agent skills before deployment. The study demonstrates that the system achieves reliable assessment consistency comparable to or better than human expert review.
SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning
SEMA-RAG is a self-evolving multi-agent RAG framework for medical question answering that decouples interpretation, exploration, and adjudication into three specialist agents, achieving significant accuracy improvements over baselines across multiple benchmarks.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
This paper introduces SkillMaster, a training framework that enables LLM agents to autonomously create, refine, and select skills through trajectory-informed review and counterfactual utility evaluation.
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.
Synthesis and Evaluation of Long-term History-aware Medical Dialogue
This paper introduces a framework for synthesizing long-term medical dialogue datasets using LLMs, and creates MediLongChat with three benchmark tasks to evaluate healthcare agents' memory and reasoning capabilities. Experiments show that even state-of-the-art LLMs struggle with these tasks.