Tag
SkillAudit introduces a framework for evolving LLM agent skills without ground-truth feedback by using paired trajectory auditing and contrastive evaluation. It achieves 73.9% average task reward across 89 tasks, outperforming baseline methods.
VisualClaw is a self-evolving multimodal agent that reduces deployment costs through hybrid encoding and skill evolution, while improving video-QA accuracy across multiple benchmarks.
SkillCAT is a training-free framework for LLM agent skill self-evolution that addresses limitations of single-trace bias, unverified merging, and full corpus loading via three stages: Contrastive Causal Extraction, Assessment-Augmented Evolution, and Topology-Aware Task Execution, achieving up to 40.40% improvement on benchmarks.
SkillChain automates the lifecycle of per-intent skill specifications for image-based e-commerce AI assistants, improving response quality and user engagement through iterative refinement and routing alignment.
Bayesian-Agent presents a framework that treats reusable skills and SOPs as hypotheses, using Bayesian inference to guide agent behavior and improve task performance through posterior-guided harness optimization. It achieves significant improvements on multiple benchmarks with deepseek-v4-flash.
Verilog-Evolve is a feedback-driven framework that iteratively refines Verilog code generated by large language models, using functional simulation, synthesis, and timing metrics to promote better candidates and evolve reusable repair skills across tasks.
The author shares insights after trying various Agent Memory implementations, concluding that only strictly length-limited entry-level memory (like Hermes) and skill evolution based on trajectory precipitation are somewhat useful, while other graph-based or card-based methods are ineffective.
SkillsVote is a governance framework for long-horizon LLM agents that manages reusable skills through structured collection, recommendation, and evolution, improving performance on Terminal-Bench 2.0 and SWE-Bench Pro without model updates.
SkillFlow proposes a flow-driven recursive skill evolution framework for LLM-based agentic orchestration, using Tempered Trajectory Balance to prevent strategy collapse and provide transparent credit assignment. Experiments on 14 datasets show significant improvements over baselines in QA, math, code, and decision-making tasks.
SkillClaw introduces a framework for collective skill evolution in multi-user LLM agent systems, enabling autonomous updates and cross-user knowledge transfer by aggregating interactions and feedback to improve performance across the ecosystem.