self-evolving

Tag

Cards List
#self-evolving

@TowardMu: https://x.com/TowardMu/status/2069194694228431273

X AI KOLs Timeline · 3d ago Cached

Introducing Apodex, a self-evolving heavy-duty solver that uses a verification-centric agent team architecture for in-depth research. It supports self-solving, evidence chain verification, and more. Currently in early access and completely free.

0 favorites 0 likes
#self-evolving

@Phoenixyin13: This is not an outrageous statement; this self-evolving Compounding Loop is the real long-term killer. Now, according to this article, everyone should get used to packaging their entire workflow—including decomposition methods, verification rules, output formats, and your preferences—into a reusable Skill. This will be a capability from the future. Next time you encounter a similar task, just call the Skill directly with almost zero configuration, speed takes off, and quality is even higher.

X AI KOLs Timeline · 2026-06-19 Cached

The tweet discusses the concept of packaging personal workflows (including decomposition methods, verification rules, output formats, etc.) into reusable Skills, arguing that this self-evolving Compounding Loop aligns with cybernetics principles and is a key long-term capability.

0 favorites 0 likes
#self-evolving

@heyshrutimishra: Apodex 1.0 dropped and the architecture is genuinely different. It's post-trained on Qwen3.5 as a self-evolving system:…

X AI KOLs Following · 2026-06-17 Cached

Apodex 1.0 is a self-evolving AI system post-trained on Qwen3.5, achieving SOTA on BrowseComp, DeepSearchQA, and HLE-text. Its 4B mini model outperforms 30B-class models, with an AgentOS runtime for task orchestration. Open weights available.

0 favorites 0 likes
#self-evolving

@NFTCPS: HarnessX is pretty interesting: an agent architecture that can modify itself. Previously, architectural changes relied entirely on manual tuning. When a new model came out, Anthropic removed the planning steps from Claude Code, and Manus refactored its agents five times in six months, each time simplifying. What to change and when to change it — all decided by humans.

X AI KOLs Timeline · 2026-06-17 Cached

HarnessX introduces a framework for self-evolving AI agent harnesses that treats the runtime harness as a first-class object, enabling automatic adaptation via trace-driven reinforcement learning. It achieves average gains of +14.5% across five benchmarks, with larger improvements for weaker models.

0 favorites 0 likes
#self-evolving

@NFTCPS: Microsoft came up with something called SkillOpt, and its approach is pretty wild: treating an agent's skill documentation like a neural network for training, with epochs, batches, learning rates, and validation sets, but without touching a single model weight. What makes it great? Let me break it down into three points: Training only modifies one skill document, and any new changes must be validated on the...

X AI KOLs Timeline · 2026-06-17 Cached

Microsoft introduces SkillOpt, a method that trains an agent's skill documentation like a neural network, using epochs, batches, learning rates, and validation sets for optimization, without modifying model weights. It achieves top results across multiple benchmarks and can be transferred across models and tools.

0 favorites 0 likes
#self-evolving

TabClaw: An Interactive and Self-Evolving Agent for Spreadsheet Manipulation and Table Reasoning

arXiv cs.CL · 2026-06-10 Cached

TabClaw is an open-source interactive AI agent for spreadsheet manipulation and table reasoning that uses LLMs to automate data analysis, support multi-table reasoning, and adapt to user preferences through memory and skill extraction.

0 favorites 0 likes
#self-evolving

@Sumanth_077: Let Agents Design Agents! Memento-Skills is a self-evolving agent framework where agents learn from failures and rewrit…

X AI KOLs Timeline · 2026-06-09 Cached

Memento-Skills is a self-evolving agent framework where agents learn from failures and rewrite their own skills, improving over time through a Read-Execute-Reflect-Write loop. It was tested on HLE and GAIA benchmarks and supports open-source LLMs like Kimi, MiniMax, and GLM.

0 favorites 0 likes
#self-evolving

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Hugging Face Daily Papers · 2026-06-08 Cached

This paper introduces SkeMex, a self-evolving framework that enhances medical agents by distilling interaction trajectories into structured skill memory, enabling better long-term clinical reasoning through context-dependent utility estimation and governance.

0 favorites 0 likes
#self-evolving

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Hugging Face Daily Papers · 2026-06-05 Cached

Skill-3D is a framework that enables AI agents to learn scene-aware skills through self-evolving memory and skill libraries, significantly improving tool utilization in 3D spatial reasoning tasks (e.g., from 39% to 78% on VSI-Bench).

0 favorites 0 likes
#self-evolving

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

arXiv cs.CL · 2026-06-04 Cached

SePO (Self-Evolving Prompt Optimization) proposes a self-referential prompt agent that optimizes both task agents' system prompts and its own system prompt through an evolutionary search, outperforming Manual-CoT, TextGrad, and MetaSPO across five benchmarks including AIME'25, ARC-AGI-1, and GPQA.

0 favorites 0 likes
#self-evolving

Parthenon Law: A Self-Evolving Legal-Agent Framework

arXiv cs.AI · 2026-06-04 Cached

Parthenon is a self-evolving legal-agent framework that structures LLM agents into six auditable layers and uses an anti-leakage learning loop to improve performance on end-to-end legal matters without modifying model weights. A large-scale empirical study on Harvey LAB with 12,510 agent trajectories shows current frontier agents still struggle with strict matter completion, and Parthenon substantially improves results over state-of-the-art baselines.

0 favorites 0 likes
#self-evolving

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Hugging Face Daily Papers · 2026-06-04

MLEvolve is a self-evolving LLM-based multi-agent framework for automated ML algorithm discovery that extends tree search to Progressive MCGS with graph-based cross-branch information flow and retrospective memory. It achieves state-of-the-art performance on MLE-Bench and outperforms AlphaEvolve on mathematical algorithm optimization tasks.

0 favorites 0 likes
#self-evolving

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

arXiv cs.AI · 2026-06-03 Cached

Introduces SkillDAG, a self-evolving typed directed graph for LLM skill selection at scale that models inter-skill relationships and allows agents to query and evolve the graph during execution, outperforming baselines on ALFWorld and SkillsBench.

0 favorites 0 likes
#self-evolving

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

arXiv cs.AI · 2026-06-03 Cached

This paper presents Traj-Evolve, a self-evolving multi-agent system that uses an experience pool and multi-agent reinforcement learning to model patient trajectories from longitudinal EHRs for lung cancer early detection, outperforming strong baselines.

0 favorites 0 likes
#self-evolving

@Xudong07452910: This 'Harness Updating Is Not Harness Benefit' is very suitable for those working on Agent Harness. It talks about an easily overlooked problem: updating Harness does not mean you can use it well. Now many Ag…

X AI KOLs Timeline · 2026-06-03 Cached

This post discusses a paper, pointing out that in the self-evolution of Agent systems, updating Harness (writing useful updates) and benefiting from updates (actually using them in subsequent tasks) are two different abilities. The latter is key, and weak models often fail to use the rules.

0 favorites 0 likes
#self-evolving

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

Hugging Face Daily Papers · 2026-06-02 Cached

EvoDS is a self-evolving autonomous data science agent that improves via reinforcement learning-driven skill acquisition and adaptive context compression, outperforming open-source agents by 28.9% on benchmarks.

0 favorites 0 likes
#self-evolving

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

arXiv cs.CL · 2026-05-29 Cached

This paper introduces GrowLoop, a self-evolving evaluation system for assessing human-likeness in open-ended conversations. It uses minimal human seed annotations to iteratively refine evaluation rubrics, addressing challenges of tacit knowledge, varying human agreement, and evolving model capabilities.

0 favorites 0 likes
#self-evolving

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

arXiv cs.AI · 2026-05-27 Cached

This paper introduces CUDAnalyst, a tool for analyzing how individual feedback signals influence planning decisions in self-evolving LLM agents for CUDA kernel generation, using trajectory freezing and selective feedback injection to enable controlled attribution.

0 favorites 0 likes
#self-evolving

@Xudong07452910: This SkillOpt paper is quite interesting—it actually addresses a very important point: AI agents in the future won't just rely on humans writing prompts; they can train their own 'job descriptions'. Currently, many skills/prompts are written one-off, and when real tasks pile up, various edge cases start to fail...

X AI KOLs Timeline · 2026-05-26 Cached

SkillOpt introduces a systematic controllable text-space optimizer that enables AI agents to train and improve their own skills (like 'work instructions') through iterative edits and validation, outperforming human-crafted and one-shot prompts across multiple benchmarks and models.

0 favorites 0 likes
#self-evolving

@omarsar0: New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize.…

X AI KOLs Following · 2026-05-25 Cached

Microsoft Research introduces SkillOpt, a method that treats agent skill documents as trainable external state, using an optimizer model to make bounded edits validated by a held-out set. The approach achieves best or tied results across 52 evaluation cells and improves accuracy by over 23 points on GPT-5.5, with zero extra inference cost and transferable skills.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback