@veyhon: 把一本技术书编译成 Claude Code / Amp 可加载的 Skill,先抽框架、原则和反模式,再用章节文件按需进上下文。 https://github.com/virgiliojr94/book-to-skill… book-to…
摘要
book-to-skill 是一个开源工具,能将技术书籍(PDF、EPUB、DOCX 等)编译成 Claude Code 或 Amp 可加载的 Skill,自动提取框架、原则和反模式,并生成分章节的上下文文件。
查看缓存全文
缓存时间: 2026/05/25 00:39
把一本技术书编译成 Claude Code / Amp 可加载的 Skill,先抽框架、原则和反模式,再用章节文件按需进上下文。
https://github.com/virgiliojr94/book-to-skill…
book-to-skill 先按文档类型走不同提取链:技术 PDF 用 Docling 保留表格和代码块,文本 PDF 走 pdftotext → PyPDF2 → pdfminer,EPUB / DOCX / HTML / RTF / MOBI 也有对应 fallback。提取结果写入 full_text.txt 和 metadata.json,再由 Claude 识别书名、作者、章节、目录和核心主题。
生成物落到 Amp 或 Claude Code 的 skills 目录:SKILL.md 放核心 mental models 和 topic index,chapters/ 每章控制在 800–1,200 tokens,glossary.md、patterns.md、cheatsheet.md 分别存术语、方法和速查表。大书用 grep / sed 按章节切片读取,避免把整本书反复塞进上下文。
virgiliojr94/book-to-skill
Source: https://github.com/virgiliojr94/book-to-skill
📚 book-to-skill
Turn any technical book or document into a Claude Code skill — ready to study, reference, and use while you work.
Why · What it generates · Usage · Requirements · How it works · FAQ · Install
🤔 Why
You buy a great technical book. You read it once. Three months later you can’t remember chapter 7 existed.
The usual workarounds don’t help:
- 📄 “Let me just search the PDF” → you get a list of pages, not answers
- 🧠 “I’ll ask Claude about this book” → it either hallucinates or says it doesn’t have the content
- 📝 “I’ll take notes as I read” → you end up with a 200-line doc you never open again
book-to-skill solves this by turning the book into a structured skill Claude loads on demand.
Once installed, you just type /your-book-slug replication and Claude reads the right chapter and answers from the actual content. No hallucination. No digging through PDFs. The book becomes part of your workflow.
📦 What it generates
Running /book-to-skill your-book.pdf (or .epub) creates a full skill at ~/.claude/skills/<slug>/:
| File | Purpose | Size |
|---|---|---|
SKILL.md | Core mental models + chapter index | ~4,000 tokens |
chapters/ch01-*.md … | One file per chapter, loaded on-demand | ~1,000 tokens each |
glossary.md | Every key term, alphabetically sorted with chapter refs | ~1,500 tokens |
patterns.md | All techniques, algorithms, and design patterns | ~2,000 tokens |
cheatsheet.md | Decision tables and quick-reference rules | ~1,000 tokens |
Chapter files are loaded on-demand — they don’t count against the skill budget until you ask about that topic.
🚀 Usage
/book-to-skill <path-to-document> [skill-name-slug]
Supported document formats: PDF, EPUB, DOCX, TXT, Markdown, reStructuredText, AsciiDoc, HTML, RTF, MOBI/AZW/AZW3.
Examples:
# PDF — derive skill name from filename
/book-to-skill ~/Downloads/designing-data-intensive-applications.pdf
# EPUB — specify a custom slug
/book-to-skill ~/books/clean-code.epub clean-code
# Full path with explicit name
/book-to-skill /tmp/ddd-evans.pdf domain-driven-design
After the skill is created, use it like any other Claude Code skill:
/designing-data-intensive-apps # load core mental models
/designing-data-intensive-apps replication # find and explain a topic
/designing-data-intensive-apps ch05 # dive into chapter 5
/designing-data-intensive-apps "what chapters do you have?"
🔧 Requirements
The extractor tries tools in order per format and uses the first available. If nothing is installed, it tells you which command to run. Plain text, Markdown, reStructuredText and AsciiDoc need no extra deps.
PDF — choose by book type:
| Book type | Tool | Install | Speed |
|---|---|---|---|
| Text-heavy (prose, few tables) | pdftotext (poppler) | sudo apt install poppler-utils | ⚡ instant |
| Text-heavy fallback | PyPDF2 | pip3 install PyPDF2 | ⚡ instant |
| Text-heavy fallback | pdfminer.six | pip3 install pdfminer.six | ⚡ instant |
| Technical (code, tables, formulas) | docling | pip3 install docling | ~1.5s/page |
Before extraction begins, the skill asks you whether the book is technical or text-heavy and picks the right tool automatically. Docling preserves markdown tables and code blocks; pdftotext is faster for prose-only books.
EPUB:
| Tool | Install | Quality |
|---|---|---|
ebooklib + beautifulsoup4 | pip3 install ebooklib beautifulsoup4 | ⭐⭐⭐ Best |
stdlib zipfile | built-in — no install needed | ⭐⭐ Always available |
Other formats:
| Format | Tool | Install |
|---|---|---|
| DOCX | python-docx (fallback: stdlib ZIP/XML) | pip3 install python-docx |
| HTML | beautifulsoup4 (fallback: stdlib html.parser) | pip3 install beautifulsoup4 |
| RTF | striprtf (fallback: regex) | pip3 install striprtf |
| MOBI / AZW / AZW3 | Calibre ebook-convert (external app, not pip) | https://calibre-ebook.com/download |
| TXT / Markdown / reStructuredText / AsciiDoc | built-in | — |
⚙️ How it works
PDF or EPUB
│
▼
Step 1.5 — "Technical or text-heavy book?"
│
├── technical → Docling (tables + code blocks as markdown, ~1.5s/page)
└── text → pdftotext → PyPDF2 → pdfminer (instant)
│
▼
scripts/extract.py --mode <technical|text>
EPUB → ebooklib → stdlib zipfile
│
├── /tmp/book_skill_work/full_text.txt
└── /tmp/book_skill_work/metadata.json
│
▼
Claude analyzes structure
(title, author, chapters, ToC)
│
▼
Generates per-chapter summaries (800–1,200 tokens each)
technical → includes Code Examples + Reference Tables sections
Generates glossary, patterns, cheatsheet
Generates master SKILL.md with core mental models
│
▼
~/.claude/skills/<slug>/ ✅ written
/tmp/book_skill_work/ 🗑️ cleaned up
Extraction benchmark (103-page technical book, CPU only):
| Method | Time | Tokens | Tables | Code blocks |
|---|---|---|---|---|
| pdftotext | 0.1s | 27K | 0 | 0 |
| Docling | 164s | 27K (+1.2%) | 48 | 36 |
Design principles (click to expand)
- Density over completeness — a 1,000-token summary beats a 10,000-token excerpt
- Practitioner voice — “Use X when Y”, not “The book explains X”
- Front-loaded SKILL.md — compaction keeps the first ~5,000 tokens; the most important content comes first
- On-demand chapters — the topic index tells Claude which file to read; chapters load only when needed
- Never raw text — always synthesize, summarize, extract signal from the source
❓ FAQ
“Can’t I just dump the PDF/EPUB into my Claude project context?”
You can — but every conversation will burn that token budget upfront. A 400-page book is ~200K tokens. With a skill, only the chapters relevant to your question load. The rest stays on disk until you need it.
More importantly: raw text injection is retrieval. A skill is reasoning. When you load a chapter file, Claude isn’t searching for keyword matches — it’s working with pre-extracted named frameworks, principles, and mental models structured for application, not for reading.
“Isn’t this just RAG?”
RAG works at query time: chunk the book → embed everything → find similar vectors → inject into prompt. It’s optimized for “find me the part that talks about X.”
book-to-skill works at compile time: one deep analysis run extracts the author’s actual frameworks, names them, describes when to use each, captures the anti-patterns. The output is structure the author spent years building — not a similarity search over their sentences.
RAG answers: “here are chunks close to your query.”
A skill answers: “here are the 12 frameworks this author built, ready to reason with.”
For searching across 50+ books, RAG wins. For going deep on one book and using its frameworks while you work, a skill wins.
“Popular books are already in Claude’s training data. Why bother?”
For widely-known books (Clean Code, DDIA, Pragmatic Programmer), Claude has general knowledge — but it’s compressed, averaged across the entire internet’s discussion of the book, and may hallucinate specific quotes or chapter locations.
book-to-skill works from your actual copy. Every framework name, every anti-pattern list, every chapter number is grounded in the text you provided. No training data drift, no hallucinated chapter titles.
It also shines for books Claude doesn’t know at all: niche technical references, internal company documentation, recent publications, translated works.
“NotebookLM handles multiple books better.”
Absolutely true — if your workflow is “I have 80 books and I want to search across all of them,” NotebookLM is the right tool.
book-to-skill is built for a different job: you want to go deep on one book and have its frameworks embedded in your coding or writing workflow, not in a separate browser tab. It’s less “library search” and more “the author is sitting next to you while you work.”
📥 Install
Copy this into your Claude Code session:
Install book-to-skill: https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md
Or manually:
mkdir -p ~/.claude/skills/book-to-skill/scripts
curl -o ~/.claude/skills/book-to-skill/SKILL.md \
https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md
curl -o ~/.claude/skills/book-to-skill/scripts/extract.py \
https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/scripts/extract.py
Then in any Claude Code session:
/book-to-skill ~/path/to/your-book.pdf
# or
/book-to-skill ~/path/to/your-book.epub
📁 Repository structure
book-to-skill/
├── SKILL.md # Skill definition + step-by-step instructions
├── scripts/
│ └── extract.py # PDF + EPUB extraction (pdftotext / PyPDF2 / pdfminer / ebooklib / zipfile)
└── README.md # This file
License
MIT
Star History
相似文章
@NFTCPS: 书虫们注意了!你家那堆买来吃灰的技术书,终于有救了。 GitHub上一个叫book-to-skill的开源神器刚火,狂揽2700多Star,玩法是真离谱: PDF或EPUB直接丢进去 自动扒目录、核心概念和套路,一键生成技能 以后敲一句 …
GitHub上开源的 book-to-skill 工具可将 PDF/EPUB 技术书转化为 Claude Code 技能,一键生成目录、核心概念和模式,让吃灰的书变身为随叫随到的私人顾问。
@tom_doerr: 将技术书籍转化为 Claude Code 技能 https://github.com/virgiliojr94/book-to-skill…
book-to-skill 将技术书籍转换为适用于 Claude Code 的结构化技能,支持按需参考并消除幻觉。
@max_ai_max: https://x.com/max_ai_max/status/2060221653259547069
本文分享了编写一个真正可用的Claude Skill的实践指南,涵盖运行机制、目录骨架、frontmatter写作、迭代方法等,帮助开发者高效构建和调试自定义技能。
@ChrisSlacker: 人们已经开始用 NotebookLM, 几分钟内批量生成专属 Claude Skills。 它的核心思路很简单: 把精选资料丢进去, 让 NotebookLM 先理解、整理、提炼, 再转成可复用的 skill.md 文件。 这样一来,Cl…
介绍如何使用 NotebookLM 快速将精选资料转化为 Claude 可复用的 skill.md 文件,让 Claude 像垂直专家一样工作,减少重复编写提示词。
@sitinme: 不“让 AI 总结一本书”,而是更进一步:把一本书、一个文档包,整理成 AI Agent 可以反复调用的 Skill,这个思路感觉可以聊一聊。 之前书买了、读了,过一阵想找里面某个知识点,翻半天找不到;问 AI 吧,它可能瞎编;把整本 P…
介绍了一个将书籍或文档包转换为AI Agent可调用Skill的工具book-to-skill,支持PDF等格式,生成SKILL.md和章节索引,避免一次性加载全部上下文。