@tom_doerr: Builds agents from 200,000 skills https://github.com/ynulihao/AgentSkillOS…
Summary
AgentSkillOS is an open-source framework that enables developers to build AI agents by retrieving and orchestrating pipelines from over 200,000 available skills.
View Cached Full Text
Cached at: 05/10/26, 10:24 AM
Builds agents from 200,000 skills
https://t.co/RYXHxWKObQ https://t.co/wImZ9xHNAn
ynulihao/AgentSkillOS
Source: https://github.com/ynulihao/AgentSkillOS
English | 简体中文
Build your agent from 200,000+ skills via skill
RETRIEVAL & ORCHESTRATION
通过技能检索与编排,从 200,000+ 技能中构建Agent
News
- [2026/03] Our new project homepage is now live!
- [2026/03] Benchmark released — 30 multi-format creative tasks across 5 categories with pairwise Bradley-Terry evaluation.
- [2026/03] Modular Architecture released — pluggable retrieval/orchestration modules. See ARCHITECTURE.md for details.
- [2026/03] Batch CLI released — headless parallel execution with YAML configs, resume support, and Rich progress UI.
🌐 Overview
🔥 The agent skill ecosystem is exploding—over 200,000+skills are now publicly available.
But with so many options, how do you find the right skills for your task? And when one skill isn’t enough, how do you compose and orchestrate multiple skills into a working pipeline?
AgentSkillOS is the operating system for agent skills—helping you discover, compose, and run skill pipelines end-to-end.
WEB UI · Visual workflow overview in the browser
CLI · Headless execution with terminal progress and logs
🌟 Highlights
- 🔍 Skill Search & Discovery — Creatively discover task-relevant skills with a skill tree that organizes skills into a hierarchy based on their capabilities.
- 🔗 Skill Orchestration — Compose and orchestrate multiple skills into a single workflow with a directed acyclic graph, automatically managing execution order, dependencies, and data flow across steps.
- 🖥️ GUI (Human-in-the-Loop) — A built-in GUI enables human intervention at every step, making workflows controllable, auditable, and easy to steer.
- ⭐ High-Quality Skill Pool — A curated collection of high-quality skills, selected based on Claude’s implementation, GitHub stars, and download volume.
- 📊 Observability & Debugging — Trace each step with logs and metadata to debug faster and iterate on workflows with confidence.
- 🧩 Extensible Skill Registry — Easily plug in new skills, bring your own skills via a flexible registry.
- 📈 Benchmark — 30 multi-format creative tasks across 5 categories, evaluated with pairwise comparison and Bradley-Terry aggregation.
💡 Examples
👉 View detailed workflows on Landing Page →
📊 Check out the comparison report: AgentSkillOS vs. without skills →

Qualitative comparison between the vanilla baseline and AgentSkillOS Quality-First outputs.
🏗️ Method
- Skill tree construction: Organizes over 200,000+ skills into a capability tree, providing structured, coarse-to-fine access for efficient and creative skill discovery.
- Skill retrieval: Automatically selects a task-relevant subset of usable skills given a user’s request.
- Skill orchestration: Composes the selected skills into a coordinated plan (e.g., a DAG-based workflow) to solve tasks beyond the reach of any single skill. Note that we also support a freestyle mode (i.e., Claude Code).

🌲 Why Skill Tree?

Left: Pure semantic retrieval prioritizes texutal similarity, often missing skills that look unrelated in embedding space but are crucial for actually solving the task—leading to narrow, myopic skill usage.
Right: Our LLM + Skill Tree navigates the capability hierarchy to surface non-obvious but functionally relevant skills, enabling broader, more creative, and more effective skill composition.
| 200 Skills | 1,000 Skills | 10,000 Skills |
![]() |
![]() |
![]() |
📈 Benchmark
We propose a benchmark of 30 multi-format creative tasks spanning 5 categories, evaluated via pairwise comparison with Bradley-Terry aggregation.
Three key properties:
- Multi-format creative tasks — Tasks require end-user artifacts in formats such as PDF, PPTX, DOCX, HTML, video, and generated images.
- Pairwise evaluation — Outputs are compared in both orders to reduce position bias and capture reliable preference signals.
- Bradley-Terry scores — Pairwise preferences are aggregated into continuous ranking scores for fine-grained system comparisons.
|
|
🧪 Experiments
Evaluated across 200 / 1K / 200K skill ecosystems, AgentSkillOS demonstrates consistent superiority over baselines, with ablation confirming that both retrieval and orchestration are indispensable, and strategy selection producing structurally distinct execution graphs.
Key findings:
- Substantial Gains over Baselines at Every Scale — All three AgentSkillOS variants achieve the highest Bradley-Terry scores across 200 / 1K / 200K ecosystems. The w/ Full Pool baseline scores poorly because a growing fraction of skills becomes invisible — structured retrieval and orchestration overcome this scalability bottleneck.
- Ablation: Both Retrieval and Orchestration Are Essential — Removing components reveals a clear degradation gradient: without DAG orchestration, retrieval alone is insufficient; without retrieval, even oracle skills cannot close the gap. Quality-First shows only a modest deficit versus the oracle upper bound, and the gap narrows as the ecosystem grows.
- Strategy Choice Shapes Execution Structure — Each orchestration strategy faithfully translates its design intent into a distinct DAG topology. Quality-First builds deep, multi-stage pipelines; Efficiency-First trades depth for width to maximize parallelism; Simplicity-First retains only essential steps.
Category Radar — Per-category Bradley-Terry performance across ecosystem scales. |
|
Ablation — Separates retrieval and orchestration effects; confirms both are required. |
DAG Structure Metrics — Different orchestration strategies induce distinct topology profiles. |
🚀 How to Use
Installation & Configuration
Prerequisites
- Python 3.10+
- Claude Code (must be installed and available in PATH)
- Use cc-switch to switch to other LLM providers
Install & Run
git clone https://github.com/ynulihao/AgentSkillOS.git
cd AgentSkillOS
pip install -e .
cp .env.example .env # Edit with your API keys
python run.py --port 8765
Download Pre-built Trees
| Tree | Skills | Description |
|---|---|---|
🌱 skill_seeds | ~50 | Curated skill set (default) |
📦 skill_200 | 200 | 200 skills |
🗃️ skill_1000 | ~1,000 | 1,000 skills |
🏗️ skill_10000 | ~10,000 | 10,000 active + layered dormant skills |
Configuration
# .env
LLM_MODEL=openai/anthropic/claude-opus-4.5
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-key
EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=your-key
Custom Skill Groups
- Create
data/my_skills/skill-name/SKILL.md - Register in
src/config.py→SKILL_GROUPS - Build:
python run.py build -g my_skills -v
Batch Execution (Headless CLI)
Run a Batch
Run multiple tasks in parallel without the Web UI:
python run.py cli --task config/batch.yaml
See config/eval/ for ready-made batch configs covering different skill managers (tree, vector), orchestrators (dag, free-style), and skill pool sizes.
Batch Config (YAML)
batch_id: my_batch
defaults:
skill_mode: auto # "auto" (discover) or "specified"
skill_group: skill_200 # Which skill pool to use
output_dir: ./runs
continue_on_error: true
execution:
parallel: 2 # Max concurrent tasks
retry_failed: 0
tasks:
- file: path/to/task1.json
- file: path/to/task2.json
- dir: path/to/tasks/ # Scan directory
pattern: "*.json"
CLI Flags
| Flag | Description |
|---|---|
--task PATH, -T | Path to batch YAML config (required) |
--parallel N, -p | Override parallel task count |
--resume PATH, -R | Resume an interrupted batch run |
--output-dir PATH, -o | Override output directory |
--dry-run | Preview tasks without execution |
--verbose, -v | Show detailed logs |
--manager PLUGIN, -m | Override skill manager (e.g., tree, vector) |
--orchestrator PLUGIN | Override orchestrator (e.g., dag, free-style) |
Resume Interrupted Runs
python run.py cli -T config/batch.yaml --resume ./runs/my_batch_20260306_120000
Completed tasks are skipped; only remaining tasks are re-executed.
Output Structure
./runs/{batch_id}/
├── batch_result.json # Batch summary (metrics, costs, eval scores)
└── {task_id}__{run_id}/ # Per-task directory
├── meta.json
├── result.json
├── evaluation.json
└── artifacts/ # Task outputs (PDF, HTML, video, etc.)
🔮 Future Work
- Recipe Generation & Storage
- Interactive Agent Execution
- Plan Refinement
- Auto Skill Import
- Dependency Detection
- History Management
- Multi-CLI Support (Codex, Gemini CLI, Cursor)
Citation
If you find AgentSKillOS useful, consider citing our paper:
@article{li2026organizing,
title={Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale},
author={Li, Hao and Mu, Chunjiang and Chen, Jianhao and Ren, Siyue and Cui, Zhiyao and Zhang, Yiqun and Bai, Lei and Hu, Shuyue},
journal={arXiv preprint arXiv:2603.02176},
year={2026}
}
Similar Articles
addyosmani/agent-skills
agent-skills is a collection of production-grade engineering skills designed to enhance the capabilities of AI coding agents.
@tom_doerr: Encodes senior engineer workflows for AI coding agents https://github.com/addyosmani/agent-skills…
A GitHub repository that packages production-grade engineering skills for AI coding agents, encoding senior engineer workflows and quality gates into slash commands like /spec, /plan, /build, etc., with setup instructions for Claude Code, Cursor, and other tools.
VoltAgent/awesome-agent-skills
Curated GitHub repository offering 1100+ real-world AI agent skills from major dev teams like Anthropic, Google, Stripe, and Vercel, compatible with Claude Code, Codex, Cursor, and other AI coding assistants.
@op7418: https://x.com/op7418/status/2065232309310427565
This article discusses the concept of Skills in the AI agent ecosystem, arguing that Skills are more than prompts—they are packaged capabilities that externalize human expertise into reusable workflow units. The author shares design principles and case studies from building popular Skills.
@tom_doerr: Zero-config multi-agent AI coding setup https://github.com/lee-to/ai-factory…
AI Factory is a zero-configuration developer tool that sets up a multi-agent AI coding environment with built-in skills, spec-driven development, and support for various AI agents like Claude, Cursor, and Codex, allowing users to start building with a single command.


